Code for "Coarse race data conceals disparities in clinical risk score performance," published at MLHC 2023.
Link to paper: https://arxiv.org/abs/2304.09270
- First, download MIMIC-IV-ED (this project was run with version 2.2). You'll need the hosp, icu, and ed modules. In preprocessing/paths.py, edit the data path to contain these folders as subdirectories.
- Run the extract_main_dataset.ipynb notebook* to generate a preprocessed dataframe for the emergency department prediction tasks that we study in the paper.
- Run the collect-and-plot-granular-performance-metrics-ML.ipynb notebook to train logistic regressions on the outcomes, store performance metrics, and plot results (to reproduce Figure 1).
- Run the compute-significance-and-compare-amount-of-variation.ipynb notebook to reproduce Table 2 and Figure 2.
*The initial version of the preprocessing code comes from the following reference: Xie, Feng, Jun Zhou, Jin Wee Lee, Mingrui Tan, Siqi Li, Logasan S/O Rajnthern, Marcel Lucas Chee, et al. 2022. “Benchmarking Emergency Department Prediction Models with Machine Learning and Public Electronic Health Records.” Scientific Data 9 (1): 658. https://doi.org/10.1038/s41597-022-01782-9. See here: https://github.com/nliulab/mimic4ed-benchmark.