/ecg-selfsupervised

Self-supervised representation learning from 12-lead ECG data

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Self-supervised representation learning from 12-lead ECG data

This is the offical code repository accompanying our paper on Self-supervised representation learning from 12-lead ECG data.

For a detailed description of technical details and experimental results, please refer to our paper:

Temesgen Mehari, and Nils Strodthoff, Self-supervised representation learning from 12-lead ECG data, Computers in Biology and Medicine 141 (2022) 105114.

@article{Mehari:2021Self,
    doi = {10.1016/j.compbiomed.2021.105114},
    url = {https://doi.org/10.1016/j.compbiomed.2021.105114},
    year = {2022},
    month = feb,
    publisher = {Elsevier {BV}},
    volume = {141},
    pages = {105114},
    author = {Temesgen Mehari and Nils Strodthoff},
    title = {Self-supervised representation learning from 12-lead {ECG} data},
    journal = {Computers in Biology and Medicine},
    eprint = {2103.12676},
    archivePrefix={arXiv},
    primaryClass={eess.SP}
}

Usage information

Preparation

  1. install dependencies from ecg_selfsupervised.yml by running conda env create -f ecg_selfsupervised.yml and activate the environment via conda activate ecg_selfsupervised
  2. follow the instructions in data_preprocessing.ipynb on how to download and preprocess the ECG datasets; in the following, we assume for definiteness that the preprocessed dataset folders can be found at ./data/cinc,./data/zheng,./data/ribeiro and ./data/ptb_xl.

A1. Pretraining CPC on All

Pretrain a CPC model on the full dataset collection (will take about 6 days on a Tesla V100 GPU): python main_cpc_lightning.py --data ./data/cinc --data ./data/zheng --data ./data/ribeiro --normalize --epochs 1000 --output-path=./runs/cpc/all --lr 0.0001 --batch-size 32 --input-size 1000 --fc-encoder --negatives-from-same-seq-only

A2. Finetuning CPC on PTB-XL

  1. Finetune just the classification head: python main_cpc_lightning.py --data ./data/ptb_xl --normalize --epochs 50 --output-path=./runs/cpc/all_ptbxl --lr 0.001 --batch-size 128 --input-size 250 --finetune --pretrained ./runs/cpc/all/version_0/best_model.ckpt --finetune-dataset ptbxl_all --fc-encoder --train-head-only
  2. Finetune the entire model with discriminative learning rates: python main_cpc_lightning.py --data ./data/ptb_xl --epochs 20 --output-path=./runs/cpc/all_ptbxl --lr 0.0001 --batch-size 128 --normalize --input-size 250 --finetune --finetune-dataset ptbxl_all --fc-encoder --pretrained ./runs/cpc/all_ptbxl/version_0/best_model.ckpt

B1. Pretraing SimCLR, BYOL, and SwAV on All

Pretrain a xresnet1d50 model on the full dataset by running the corresponding lightning module:

SimCLR: python custom_simclr_bolts.py --batch_size 4096 --epochs 2000 --precision 16 --trafos RandomResizedCrop TimeOut --datasets ./data/cinc ./data/zheng ./data/ribeiro --log_dir=experiment_logs

BYOL: python custom_byol_bolts.py --batch_size 4096 --epochs 2000 --precision 16 --trafos RandomResizedCrop TimeOut --datasets ./data/cinc ./data/zheng ./data/ribeiro --log_dir=experiment_logs

SwAV: python custom_swav_bolts.py --batch_size 4096 --epochs 2000 --precision 16 --trafos RandomResizedCrop TimeOut --datasets ./data/cinc ./data/zheng ./data/ribeiro --log_dir=experiment_logs

Transformations can be combined by sequentially attaching as shown in the command above. One can choose from [GaussianNoise, GaussianBlur, RandomResizedCrop, TimeOut, ChannelResize, DynamicTimeWarp, BaselineWander, PowerlineNoise, EMNoise, BaselineShift]. The Transformations are described in detail in the appendix of the paper. All of the lightning modules come with some (hyper)parameters. Feel free to run python *lightning_module* --help to get the descriptions of the parameters.

The results of the runs will be written to the log_dir in the following form:

experiment_logs
      → Wed Dec 2 17:06:54 2020_simclr_696_RRC TO
            → checkpoints
                  → epoch=1654.ckpt
                  → model.ckpt
            → events.out.tfevents.1606925231
            → hparams.yaml
      

2 directories, 5 files

While hparams.yaml and the event file store the hyperparameters and tensorboard logs of the run, respectively, we store the trained models in the checkpoints directory. We store two models, the best model according to the validation loss and the last model after training.

B2. Finetuning SimCLR, BYOL and SwAV (and CPC) on PTB-XL

Finetune a given model by running

python eval.py --method simclr --model_file "experiment_logs/Wed Dec 2 17:06:54 2020\_simclr\_696\_RRC TO/checkpoints/model.ckpt" --batch_size 128 --use_pretrained --f_epochs 50 --dataset ./data/ptb_xl

and similarly perform a linear evaluation by running

python eval.py --method simclr --model_file "experiment_logs/Wed Dec 2 17:06:54 2020\_simclr\_696\_RRC TO/checkpoints/model.ckpt" --batch_size 128 --use_pretrained --l_epochs 50 --linear_evaluation --dataset ./data/ptb_xl

The most important parameters are the location of the model you want to finetune and the method that was used during pretraining. Again, consider running python eval.py --help to get yourself an overview of the possible parameters. The results are written to the model directory:

experiment_logs
      → Wed Dec 2 17:06:54 2020_simclr_696_RRC TO
            → checkpoints
                  → epoch=1654.ckpt
                  → model.ckpt
                  → n=1_f=8_fin_finetuned
                        → finetuned.pt
                  → n=1_f=8_res_fin.pkl
            → events.out.tfevents.1606925231
            → hparams.yaml       

3 directories, 7 files

The script prints the results and saves the finetuned model in a new folder next to the pretrained model and a pickle file containing the results of the evaluation.

Pretrained models

For each method (SimCLR, BYOL, CPC), we provide the best-performing pretrained model after pretraining on All: link to datacloud. More models are available from the authors upon request.