Machine Learning and Cardiology
Heartbeat Classification - Methodology
In this chapter, four possible approaches for translating the heartbeats into a feature set are discussed. Note that the first four mentioned approaches shouldn't be combined, since they all describe the shape of the heartbeat although in another way. Despite the four further-mentioned ways of representing the heartbeats morphology as features, additional features concerning the heartbeats characteristic are used for all of these approaches. The so-called RR-Interval features are indicating the distance in terms of time between R waves. For each signal the overall average time between two successive heartbeats is measured and used as one feature.
Furthermore, for each heartbeat the distance to the last and to the next R wave is measured.
Moreover, the database contains an annotation indicating whether a P wave occured or not. Hence, the existence of a P-wave was used.
This makes a total of four additional features which are combined by the specific features the following approaches:
7.1 Approach 1: Heartbeat Segmentation by Slopes
Using these annotations one can isolate the ith heartbeat, starting at (ti-1 + ti)/2 and ending at (ti + ti+1)/2. For classification purposes, a subdivision of each heartbeat into n cells needs to be done. Every cell consists of the gradient from the former point of the heartbeat to the next point. Afterwards, a feature set of the heartbeats is obtained, in which all heartbeats have the same dimensionality n.
7.2 Approach 2: QRS Segmentation by Slopes
Using the annotations, another approach taken is to just take the part of the annotated heartbeat which only covers the QRS complex. The intention is, that all other relevant information is covered by the other features.
This approach is very similar to the first one, with that difference that in stead of the whole heartbeat, only the QRS wave is concidered.
7.3 Approach 3: Heartbeat Approximation by Polynomials
When analyzing a heartbeat, it is possible to approximate it by a polynomial. This technique is often used to interpolate between a set of given samples. For this purpose, it is important to resample the heartbeat into a lower amount of samples. The number of coefficients of the approximation polynomial is related to the amount of samples in a linear expression, see Equation 7.1.

The complexity of calculating such a polynomial increases exponentially by the amount of data points. When the coefficients of the polynomial approximating the given heartbeat are found, they can be directly used as features. However, it is a better idea to give these coefficients a weight, since not every coefficient has the same importance. Therefore, coefficients should be multiplied by a value determined by their position in the time series (higher orders are more important). Also, it is important to dismiss some of these coefficients, since not all of them are important and significant enough to represent the data. Finally, keeping too many coefficients might result into an overrepresented feature set of only a select amount of characteristics. Finally, the coefficients of x0 should be thrown away, since they represent the shift in time of the heartbeat along the whole signal.
7.4 Approach 4: Heartbeat Approximation by Sinusoids
However approximation by polynomials might give a good insight into the shape of the heartbeat, a better approximation could be achieved by sinusoids. This is justified by the fact that a heartbeat looks much more like a wave than a polynomial. The Fast Fourier Transform is a possible technique:

The features to be selected are the first magnitude values, since they represent the most common shape of the beat. Again, only some of these values are to be kept for the same reasons as mentioned above. It is clear that many more features are needed to represent the signal, but FFT makes it possible to represent quite some knowledge by only a select amount of features.
7.5 Additional Features
Despite these four possible ways to represent the heartbeat morphology as features, additional features concerning the heartbeat characteristics are used. The so-called RR-Interval features are indicating the distance in terms of time between R waves. For each signal the overall average time between two successive heartbeats is measured and used as one feature.
Furthermore, for each heartbeat the distance to the last and to the next R wave is measured. Moreover, the database contains an annotation indicating whether a P wave occured or not. Hence, the existence of a P-wave was used. This means four features are used additionally to the discussed morphological ones.
7.6 K-Nearest Clustering
The K-nearest neighbor classifier is theoretically a simple yet efficient classifier. Given a dataset, it classifies each new instance depending on a majority voting of its distance d to the k nearest instances in the dataset. Choosing the distance measure is an important choice which can be very difficult. In this project, the Mahalanobis distance was used (Eq 7.4). Here, the covariance among the datapoints is taken into account, which solves the problems of scale and correlation inherent in the Euclidean distance. After computing the k nearest neighbors, a majority voting determines the class of the considered datapoint.

p = number dimensions and x,y instances of the database