
A python package for extracting features from ECGs

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0



A method to extract features from electrocardiographic recordings

The purpose of this package is to make tabular data from ECG-recordings by calculating many features. The package is built on WFDB [1] and NeuroKit2 [2].

Documentation Status https://travis-ci.org/ECG-featurizer/ECG-featurizer.svg?branch=master https://coveralls.io/repos/github/ECG-featurizer/ECG-featurizer/badge.svg?branch=master GitHub Forks GitHub Open Issues Project Status: Active - The project has reached a stable, usable state and is being actively developed.


To install ECG-featurizer, run this command in your terminal:

pip install ECG-featurizer


Featurize .dat-files:

from ECGfeaturizer import featurize as ef

# Make ECG-featurizer object
Feature_object =ef.get_features()

# Preprocess the data (filter, find peaks, etc.)

Featurize .mat-files:

from ECGfeaturizer import featurize as ef

number_of_ECGs = <the amount of ECGs>
directory = "<your dir>"

# Make ECG-featurizer object
Feature_object =ef.get_features()

# Preprocess the data (filter, find peaks, etc.)
My_features=Feature_object.featurizer_mat(num_features=number_of_ECGs, mat_dir = directory)


A numpy array of ECG-recordings in directory. Each recording should have a file with the recording as a time series and one file with meta data containing information about the patient and measurement information. This is standard format for WFDB and PhysioNet-files [1] [3]

Supported input files:

Input data Supported file format
ECG-recordings .dat files
Patient meta data .hea files


A numpy array of labels / diagnoses for each ECG-recording. The length of the labels-array should have the same length as the features-array .. code-block:: python

len(labels) == len(features)


A string with the path to the features. If the folder structure looks like this:

├── ECG-recordings
│ ├── A0001.hea
│ ├── A0001.dat
│ ├── A0002.hea
│ ├── A0002.dat
│ └── Axxxx.dat

then the feature and directory varaible could be:

features[0] "A0001"

directory "./mypath/ECG-recordings/"


The demographical data that is used in this function is age and gender. A Dataframe with the following 3 columns should be passed to the featurizer() function.

  age gender filename_hr
0 11.0 1 "A0001"
1 57.0 0 "A0002"
2 94.0 0 "A0003"
3 34.0 1 "A0004"

The strings in the filename_hr -column should be the same as the strings in the feature array. In this example gender is OneHot encoded such that

1 = Female 0 = Male


Other examples:


GPLv3 license


Citation guidelines will come


https://img.shields.io/pypi/dd/ECG-featurizer https://img.shields.io/github/stars/ECG-featurizer/ECG-featurizer https://img.shields.io/github/forks/ECG-featurizer/ECG-featurizer


[1]WFDB: https://github.com/MIT-LCP/wfdb-python
[2]Makowski, D., Pham, T., Lau, Z. J., Brammer, J. C., Lesspinasse, F., Pham, H., Schölzel, C., & S H Chen, A. (2020). NeuroKit2: A Python Toolbox for Neurophysiological Signal Processing. Retrieved March 28, 2020, from https://github.com/neuropsychology/NeuroKit
[3]Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101(23):e215-e220 [Circulation Electronic Pages; http://circ.ahajournals.org/content/101/23/e215.full]; 2000 (June 13). PMID: 10851218; doi: 10.1161/01.CIR.101.23.e215