/ECG-classification-using-open-data

ECG classification using public data and state-of-the-art 1D CNN models. This work is based on George Moody Challenge 2020

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Classification of 12-lead ECGs

/img/12_lead_ecg_plot.png

Figure 1: This plot is made by using ecg plot [1] and the ECG data is from the PTB Diagnostic DB [2].

This project is based on the work we did in the PhysioNet/Computing in Cardiology Challenge 2020. This paper [3] describes the Challenge and this paper discribes our contribution in this challenge.

Data:

The data set in this project contains 43.101 ECGs and comes from six different sources. Table 1 show the six sources.

Table 1: The table lists the six different sources used in the data set in this project

Data set number Name
1 China Physiological Signal Challenge 2018
2 China Physiological Signal Challenge 2018 Extra
3 St.Petersburg Institute of Cardiological Technics
4 PTB Diagnostics
5 PTB-XL
6 Georgia 12-Lead ECG Challenge Database

Preprocessing of data:

The data used in two different ways by the models in this project. The first method, used by the Convolutional Neural Networks, is to use the as they are from the original dataset. The second method, used by the two ensemble models, is to extract features from the ECGs and and create a table with n rows and m columns were n = numbers of ECG recordings and m = number of features. The ECG-features are extracted by using an ECG-featurizer [4]. The featurized data can be found here and the script for making the dataset is here:

makedataset

Get access to the data:

To get access to the data used in this study you can either download it from https://physionetchallenges.github.io/2020/#data or download the same data set from Kaggle. To use the codes in this repository you should sign up for a Kaggle account and get a Kaggle API token and use this to get access to the Kaggle data set from Google Colab. Google Colab Pro was used to get sufficient GPU power and enough runtime.

How to get your Kaggle API token:

  1. Log in to your Kaggle account or sign up here
  2. On the left side of the "edit profile"-button you click on the "Account"-option.
  3. Scroll down to the API-section and click "Create New API Token"-button.
  4. You will now have a file named kaggle.json. This is your API-token
  5. You can upload the kaggle.json-file to the Google Colab notebook and then you are able to download datasets from Kaggle

Models:

10-fold cross-validated models:

Model number Model Link to Google Colab Notebook Link to Notebook on github
1 FCN FCN Notebook
2 Encoder Encoder Notebook
3 FCN + MLP FCN-MLP Notebook
4 Encoder + MLP Encoder-MLP Notebook
5 & 6 Encoder + FCN (and Encoder + FCN + rule-based model) FCN-Encoder Notebook
7 & 8 Encoder + FCN + MLP + (and Endcoder + FCN + MLP + Rule-based model) Encoder-FCN-MLP Notebook
9 Ensemble model - 12 leads ensemble12lead Notebook
10 Ensemble model - 2 leads ensemble2lead Notebook

** Please use these two URLs for the two ensemble models:

12-lead:

https://www.kaggle.com/bjoernjostein/fys-stk-oblig3-physionet-challenge-2021-starter

2-lead:

https://www.kaggle.com/bjoernjostein/fys-stk-oblig3-physionet-challenge-2021-will-2-do

or se the the same models here on Github

12-lead:

https://github.com/Bsingstad/FYS-STK4155-oblig3/blob/master/Notebooks/Models/EnsembleModel12lead.ipynb

2-lead:

https://github.com/Bsingstad/FYS-STK4155-oblig3/blob/master/Notebooks/Models/EnsembleModel2lead.ipynb

Plot the cross-validation results:

The results from the cross-validated models can be plotted with this notebook plot . The figures can be found here.

Explainable AI:

Explanination - Convolutional Neural Network:

https://colab.research.google.com/drive/13lR2Rx7mHLBlhbDzyViMPIukT2wV5jsj?usp=sharing

https://github.com/Bsingstad/FYS-STK4155-oblig3/blob/master/Notebooks/Explainable%20AI/Encoder_Physionet_Challenge_explain.ipynb

Explanination - Ensemble Model :

https://www.kaggle.com/bjoernjostein/fys-stk-oblig3-physionet-challenge-2021-explain

https://github.com/Bsingstad/FYS-STK4155-oblig3/blob/master/Notebooks/Explainable%20AI/fys-stk-oblig3-physionet-challenge-2021-explain.ipynb

Paper:

The paper describing the work in this project can be found here:

latex-file

License:

Licensed under the Apache 2.0 License

References:

[1]ECG plot: https://github.com/dy1901/ecg_plot
[2]PTB Diagnostic DB: Bousseljot R, Kreiseler D, Schnabel, A. Nutzung der EKG-Signaldatenbank CARDIODAT der PTB über das Internet. Biomedizinische Technik, Band 40, Ergänzungsband 1 (1995) S 317 (https://physionet.org/content/ptbdb/1.0.0/)
[3]Perez Alday, Erick A, Annie Gu, Amit J Shah, Chad Robichaux, An-Kwok Ian Wong, Chengyu Liu, Feifei Liu, mfl. «Classification of 12-lead ECGs: the PhysioNet/Computing in Cardiology Challenge 2020». Physiological Measurement, 11. november 2020. https://doi.org/10.1088/1361-6579/abc960.
[4]ECG-Featurizer: https://github.com/ECG-featurizer/ECG-featurizer