ML4H2020

Repository for projects of the ETH Zürich course "Machine Learning for Health Care (Spring 2020)" (lecture page).

Authors :
Han Bai
Nora Moser
Martin Tschechne (martints@ethz.ch)

Project 1 - ECG Time Series

Classifying ECG signals of the MIT-BIH Arrhythmia Dataset and the PTB Diagnostic ECG Database by Recurrent Neural Networks and make use of Transfer Learning techniques in order to improve predictive performance.

Results

Models	MIT-BIH^*	PTBDB^*	PTBDB^°	PTBDB^†
LSTM + FC	F1: 0.184 Acc: 0.823	F1: 0.787 Acc: 0.776 AUROC: 0.808 AUPRC: 0.934	F1: 0.419 Acc: 0.722 AUROC: 0.5 AUPRC: 0.861	F1: 0.371 Acc: 0.565 AUROC: 0.397 AUPRC: 0.805
CNN + LSTM + FC	F1: 0.868 Acc: 0.971	F1: 0.940 Acc: 0.951 AUROC: 0.947 AUPRC: 0.982	F1: 0.988 Acc: 0.990 AUROC: 0.988 AUPRC: 0.996	F1: 0.992 Acc: 0.994 AUROC: 0.990 AUPRC: 0.996
LSTM + XGB^‡	F1: 0.875 Acc: 0.976	F1: 0.971 Acc: 0.977 AUROC: 0.968 AUPRC: 0.988	F1: 0.963 Acc: 0.970 AUROC: 0.955 AUPRC: 0.983	-
CNN + LSTM + XGB^‡	F1: 0.916 Acc: 0.985	F1: 0.983 Acc: 0.986 AUROC: 0.980 AUPRC: 0.993	F1: 0.981 Acc: 0.990 AUROC: 0.977 AUPRC: 0.991	-
XGB	F1: 0.896 Acc: 0.979	F1: 0.970 Acc: 0.976 AUROC: 0.966 AUPRC: 0.987	-	-
Kachuee, et al.[1]	Acc: 0.934	-	F1: 0.951 Acc: 0.959	-
Baseline[2]	F1: 0.915 Acc: 0.985	F1: 0.988 Acc: 0.983	F1: 0.969 Acc: 0.956	F1: 0.994 Acc: 0.992

^* Only trained on this dataset
^° Transfer Learning, pre-trained model trained on MIT-BIH, retrained with frozen base layers
^† Transfer Learning, pre-trained model trained on MIT-BIH, retrained with unfrozen base layers
^‡ Base layers always frozen to train XGBoost

Visualization of learned embeddings

	t-SNE	UMAP	PCA
MIT-BIH
PTBDB

For more details about the project have a look at the README.md in the project directory /ECG-time-series.

Project 2 - Diabetes Readmission Prediction

Investigating which medical features from patient records (categorical, numerical and text) play an important role in the prediction of patient readmission. Comparing models using only numerical + categorical features, only text and both.

Results

Cat./Num. Features

Text Features

For more details about the project have a look at the README.md in the project directory /Diabetes-readmission.

Project 3 - Medical Image Segmentation

Using the U-Net neural network model [4] to segment MRI prostate images from the NCI-ISBI 2013 Challenge - Automated Segmentation of Prostate Structures into anatomical regions (Peripheral Zone & Central Gland). Part of the project was to perform hyperparameter-tuning and try different optimizers and loss-functions to reduce generalization error.

Results

Example Test Image and Prediction

For more details about the project have a look at the README.md in the project directory /Image-segmentation.

Project 4 - Splice Site Prediction

Splice site prediction is common problem in computational genome finding where it is desirable to find the splice sites that mark the boundaries of exons and introns in organisms whose cells have a nucleus enclosed within membranes (eukaryotes). This classification can then be used to predict a gene's structure, function, interaction or its role in a disease. Main challenge of this task was the high class imbalance of the splice sites.

Results

C.Elegans DNA

Human DNA

For more details about the project have a look at the README.md in the project directory /Splice-site-prediction.

Requirements

pandas, numpy, scikit-learn, keras, matplotlib, xgboost, umap-learn, seaborn, keras-contrib, tensorflow-addons

References

[1] Mohammad Kachuee, Shayan Fazeli, and Majid Sarrafzadeh. "ECG Heartbeat Classification: A Deep Transferable Representation." arXiv preprint arXiv:1805.00794 (2018) .

[2] CVxTz's GitHub implementation: ECG_Heartbeat_Classification (link)

[3] Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records. https://doi.org/10.1155/2014/781670

[4] Ronneberger, Olaf & Fischer, Philipp & Brox, Thomas. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. LNCS. 9351. 234-241. 10.1007/978-3-319-24574-4_28.

Project Organization

For this repository the cookiecuter data science project template is used slightly adapted to the our needs and requirements. Each of the four projects is in a separate folder which is a copy of the src directory.

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.testrun.org

Project based on the cookiecutter data science project template. #cookiecutterdatascience

MartinTschechne/ML4H2020

ML4H2020

Project 1 - ECG Time Series

Project 2 - Diabetes Readmission Prediction

Project 3 - Medical Image Segmentation

Project 4 - Splice Site Prediction

Requirements

References

Project Organization