Multi-Modality Machine Learning Predicting Parkinson’s Disease (Online Appendix)
LNG ❤️ Open Science 😍
Summary
This is the online appendix for the manuscript titled "Multi-Modality Machine Learning Predicting Parkinson’s Disease", where we integrated clinico-demographic, genetic, and transcriptomic data within an automated machine learning open science framework (GenoML) to predict Parkinson’s disease and identify potential novel therapeutic targets for drug development
Last Updated: June 2021
Workflow Diagram
Helpful Links
Orientation
- The
scripts/
directory includes the pre-processing, munging, training, tuning, optmiziation, and networks scripts (coming soon!) - The
topmodel_G1E5T1E2/
directory includes the- Top model's figures, metrics, and .joblib model
- Genetics at a p-value threshold of 1E-5 only figures, metrics, and .joblib model
- Transcriptomics at a p-value threshold of 1E-2 only figures, metrics, and .joblib model
- Clinicodemographic only figures, metrics, and .joblib model
- The
shapAnalysis/
directory includes the- Surrogate XGBoostClassifier models for the combined model, rsIDs only, ENSGs only, or rsIDs and ENSGs only
- Figures showing top 5% of features in the combined model
- A directory with dependence plots between each SNP and its effect on PRS90
- The
trained_joblibs/
directory includes all the 49 trained models in.joblib
format - The
tuned_joblibs/
directory includes all the 49 tuned models in.joblib
format - The
trained_metricsplots/
includes all metrics for the 49 trained models*.trainedModel_withheldSample_probabilities.png
*.trainedModel_withheldSample_ROC.png
*.training_withheldSamples_performanceMetrics.csv
- The
tuned_metricsplots/
directory includes all metrics for the 49 tuned models*.tunedModel_CV_Summary.csv
*.tunedModel_top10Iterations_Summary.csv
- The
tested_metricsplots/
directory includes all metrics for the 49 tested models in PDBP*.testedModel_allSample_probabilities.png
*.testedModel_allSample_ROC.png
*.testedModel_allSamples_performanceMetrics.csv
- The
tables/
directory includes all the tables referenced in the manuscript and supplemental tables
Sheet Tab Name | Description |
---|---|
T1 | Table 1: Descriptive statistics of studies included from AMP-PD. |
T2 | Table 2: Performance metric summaries comparing training in withheld samples in PPMI |
T3 | Table 3: Performance metric summaries comparing at tuned cross-validation in withheld samples in PPMI |
T4 | Table 4: Performance metric summaries comparing combined tuned and untuned model performance on PDBP validation dataset |
T5 | Table 5: Optimizing the AUC threshold in withheld training samples and in the validation data |
ST1 | Supp. Table 1: Complete performance metrics for best combined method comparing training in withheld samples in PPMI |
ST2 | Supp. Table 2: Rarer coding variant burden analyses for genes under GWAS peaks |
ST3 | Supp. Table 3: Complete summary statistics for QTL Mendelian randomization |