This repo uses the ricu
R package to derive patient cohorts for prediction tasks from the following intensive care databases:
Dataset | MIMIC-III / IV | eICU-CRD | HiRID | AUMCdb |
---|---|---|---|---|
Admissions | 40k / 73k | 200k | 33k | 23k |
Version | v1.4 / v2.2 | v2.0 | v1.1.1 | v1.0.2 |
Frequency (time-series) | 1 hour | 5 minutes | 2 / 5 minutes | up to 1 minute |
Originally published | 2015 / 2020 | 2017 | 2020 | 2019 |
Origin | USA | USA | Switzerland | Netherlands |
New datasets can also be added. We are currently working on a package to make this process as smooth as possible.
We provide five common tasks for clinical prediction by default:
No | Task | Frequency | Type |
---|---|---|---|
1 | ICU Mortality | Once per Stay (after 24H) | Binary Classification |
2 | Acute Kidney Injury (AKI) | Hourly (within 6H) | Binary Classification |
3 | Sepsis | Hourly (within 6H) | Binary Classification |
4 | Kidney Function(KF) | Once per stay | Regression |
5 | Length of Stay (LoS) | Hourly (within 7D) | Regression |
New tasks can be easily added. The following repositories may be relevant as well:
- YAIB: Main repository for YAIB.
- YAIB-models: Pretrained models for YAIB.
- ReciPys: Preprocessing package for YAIB pipelines.
If you use this code in your research, please cite the following publication:
@article{vandewaterYetAnotherICUBenchmark2023,
title = {Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML},
shorttitle = {Yet Another ICU Benchmark},
url = {http://arxiv.org/abs/2306.05109},
language = {en},
urldate = {2023-06-09},
publisher = {arXiv},
author = {van de Water, Robin and Schmidt, Hendrik and Elbers, Paul and Thoral, Patrick and Arnrich, Bert and Rockenschaub, Patrick},
month = jun,
year = {2023},
note = {arXiv:2306.05109 [cs]},
keywords = {Computer Science - Machine Learning},
}
This paper can also be found on arxiv: https://arxiv.org/pdf/2306.05109.pdf
Run the following commands to clone this repo:
git clone https://github.com/rvandewater/YAIB-cohorts.git
cd YAIB-cohorts
Once you have cloned the repo, all cohorts can be created directly from within R or via an interface from python. Instructions for each can be found at:
Note: due to some recent bug fixes in ricu, the extracted cohorts might differ marginally to those published in the benchmarking paper.
To output the cohorts in the Clairvoyance (https://github.com/vanderschaarlab/clairvoyance) format, you can use the following utils.py function
output_clairvoyance(data_dir, save_dir, task_type="static")
You can specify the size and the type of task ("static": i.e., one outcome label per stay_id (mortality, KF) or "dynamic": (Sepsis, AKI, LOS), i.e., one outcome label per time step) and the train/test split in the make_train_test
function.
The code in this repository heavily utilises the ricu
R package, without which deriving these cohorts would have been much more difficult. If you use the code in this repo, please go give their repo a star :)
This repo is based on earlier work by Rockenschaub et al. (2023), which can be found at https://github.com/prockenschaub/icuDG-preprocessing
This source code is released under the MIT license, included here.