Machine Learning and Cardiology

Heartbeat Classification - Preprocessing

The given database for this part of the research, MIT-BIH Arrhythmia Database, encloses the locations of the heartbeat and some annotations. The complete list of annotations can be found in the appendix.

 

6.1 Classes

The MIT-BIH Arrhythmia database consists over 25 different annotations of heartbeats. These can then be grouped into over 15 different classes when using the MIT-BIH standards. There is another organisation known as the Association for the Advancement of Medical Instrumentation (AAMI) that states that these can be divided into 5 different classes [1]. In this research these classes were used and the MIT-BIH classes converted to this standard. Figure 6.1 shows the various AAMI classes and what constitutes each class. The benefit of using this standard (ANSI/AAMI EC57:1998) is that the classification can be applied to all kind of ECGs that are annotated by the same standards. It also simplifies the classification process, since less classes are to be considered.

For representing the data, several features were chosen. In the next section, the choice of our features is discussed and four kinds of representations of a heartbeat are presented. Furthermore, the classifier and validation method is explained.

fig 6.1: Heartbeats grouped by class

 

6.2 Stratisfication

When splitting up the data into several folds (for training and validation purpose), the danger arises that the original proportions of the data might not be represented anymore in the several folds. This can happen for example when some data is very rare, or when the data points are ordered in a specific way before they are split up. An approach to overcome this problem is stratification. When the folds are being created, the stratisfication algorithm has to check for each data point what fold it should be assigned to. This decision is made by the aid of a precomputed distribution of the overall data, and the current distributions in the different folds. As long as the current distribution of a specific class is lower than the overall distribution, instances are added to this fold. This process is repeated until all data is assigned to some fold.

 

<<< Previous | Index | Next >>>