som-shahlab/ehrshot-benchmark

Train/dev/test split

GGGGFan opened this issue · 2 comments

Thank you for making such amazing work publicly available. I just wonder where to find information on train/dev/test in order to reproduce results in your preprint paper. I only found chexpert_labeled_radiology_notes.csv with splitting information for the Chest X-Ray Findings but nothing for other tasks. Thank you!

Hi @GGGGFan thank you for the note, and appreciate your interest in our work!

The patient splits are currently done in-line within the code here: https://github.com/som-shahlab/ehrshot-benchmark/blob/be3c771e468c909dc7f35bc59302fd86b24a8f30/ehrshot/4_generate_shot.py#L112C5-L115C51

But to make things simpler, we will release a file containing the exact patient IDs within each split for easier processing. Unfortunately, however, as noted in this issue, we are currently sorting out some additional data release logistics that need to be figured out before we are able to publish any additional files. I will update you as soon as we are able to sort this out, and thank you for the patience!

@GGGGFan the dataset with splits should be open now, sorry for the delay! https://redivis.com/datasets/53gc-8rhx41kgt