/graphs4mer

Primary LanguagePythonMIT LicenseMIT

Modeling Multivariate Biosignals With Graph Neural Networks and Structured State Space Models

Siyi Tang, Jared A Dunnmon, Qu Liangqiong, Khaled K Saab, Tina Baykaner, Christopher Lee-Messer, Daniel L Rubin. Proceedings of the Conference on Health, Inference, and Learning, PMLR 209:50-71, 2023. (Best Paper Award)

https://proceedings.mlr.press/v209/tang23a/tang23a.pdf


Setup

This codebase requries python ≥ 3.9, pytorch ≥ 1.12.0, and pyg installed. Please refer to PyTorch installation and PyG installation. Other dependencies are included in requirements.txt and can be installed via pip install -r requirements.txt


Datasets

TUSZ

The TUSZ dataset is publicly available and can be accessed from https://isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml after filling out the data request form. We use TUSZ v1.5.2 in this study.

TUSZ data preprocessing

First, we resample all EEG signals in TUSZ to 200 Hz. To do so, run:

python data/preprocess/resample_tuh.py --raw_edf_dir {dir-to-tusz-edf-files} --save_dir {dir-to-resampled-signals}

DOD-H

The DOD-H dataset is publicly available and can be downloaded from this repo.

ICBEB

The ICBEB dataset is publicly available and can be downloaded using this script from this repo.

ICBEB data preprocessing

We will follow this repo to split the ICBEB dataset into train/validation/test sets, downsample the ECGs to 100 Hz, and obtain nine ECG class labels. To do so, run:

python data/preprocess/preprocess_icbeb.py --raw_data_dir <raw-icbeb-data-dir> --output_dir <icbeb-data-dir> --sampling_freq 100

Model Training

scripts folder shows examples to train GraphS4mer on the three datasets. These scripts have been tested on a single NVIDIA A100 GPU and a single NVIDIA TITAN RTX GPU. If you have a GPU with smaller memory, you can decrease the batch size and set accumulate_grad_batches to a value > 1.

Model training on TUSZ dataset

To train GraphS4mer on the TUSZ dataset, specify <dir-to-resampled-signals>, <preproc-save-dir>, and <your-save-dir> in scripts/run_tuh.sh, then run the following:

bash ./scripts/run_tuh.sh

Note that the first time when you run this script, it will first preprocess the resampled signals by sliding a 60-s window without overlaps and save the 60-s EEG clips and seizure/non-seizure labels in PyG data object in <preproc-save-dir>.

Model training on DOD-H dataset

To train GraphS4mer on the DOD-H dataset, specify <dir-to-dodh-data> and <your-save-dir> in scripts/run_dodh.sh, then run:

bash ./scripts/run_dodh.sh

Model training on ICBEB dataset

To train GraphS4mer on the ICBEB dataset, specify <icbeb-data-dir> and <your-save-dir> in scripts/run_icbeb.sh, then run:

bash ./scripts/run_icbeb.sh

Updates

  • 2023-03: Traffic forecasting related experiments have been moved to the branch traffic.

Reference

If you use this codebase, or otherwise find our work valuable, please cite:


@InProceedings{pmlr-v209-tang23a,
  title = 	 {Modeling Multivariate Biosignals With Graph Neural Networks and Structured State Space Models},
  author =       {Tang, Siyi and Dunnmon, Jared A and Liangqiong, Qu and Saab, Khaled K and Baykaner, Tina and Lee-Messer, Christopher and Rubin, Daniel L},
  booktitle = 	 {Proceedings of the Conference on Health, Inference, and Learning},
  pages = 	 {50--71},
  year = 	 {2023},
  editor = 	 {Mortazavi, Bobak J. and Sarker, Tasmie and Beam, Andrew and Ho, Joyce C.},
  volume = 	 {209},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {22 Jun--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v209/tang23a/tang23a.pdf},
  url = 	 {https://proceedings.mlr.press/v209/tang23a.html},
}