Development of an ensemble machine learning prognostic model for predicting 60-day risk of major adverse cardiac events in adults with chest pain
Authors: Chris J. Kennedy, Dustin G. Mark, Jie Huang, Mark J. van der Laan, Alan E. Hubbard, Mary E. Reed
This code is intended to run on R 4.0.4. Exact version numbers for all software packages are listed in the renv.lock file.
Open the repository as an RStudio project, which will activate renv
.
Then run:
renv::restore()
This will install all necessary packages with the correct versions. It will take about 30 minutes to run.
The following data files (EHR exports) need to be placed in the
data-raw
directory:
- data_grace3.sas7bdat - main data file
- _outrace.sas7bdat - patient race
- _outgfr.sas7bdat - eGFR
- _outbmi.sas7bdat - BMI
- _lab_vdw.sas7bdat - A1C, HDL, LDL, triglycerides
- chestpaindata.sas7bdat - MACE+ supplemental outcome
Then knit the scripts in the following order:
- import-data.Rmd
- estimator-superlearner.Rmd - takes 3+ days to run, depending on CPU cores.
- variable-importance.Rmd
- interpretation.Rmd
- decision-analysis.Rmd
Note: in import-data.Rmd the GLRM grid search is disabled by default due
to its CPU-intensive nature. To re-enable it, go to the section
glrm_grid_search
and set eval = TRUE the RMarkdown header (current
value is FALSE).
Examples and demonstration results for the accompanying ck37r
package
are provided in its github repository.
The contents of this repository are distributed under the MIT license.
© Chris J. Kennedy, 2021