bvanaken/clinical-outcome-prediction

Different Data Size

Opened this issue · 2 comments

Hi.
I ran mp/mp.py, but the data statistic is different with your result
my train, valid, test is (33954, 4908, 9822) (original : 33997, 4918, 9830)

My mimic3 version is 1.4 (latest) . I think it's pandas version difference.
Can I know your environment's pandas version?
Thank you

Hi,
we have made some adjustments to the mortality prediction task in the meantime, because there were some cases left, for which the death of the patient was described in the notes. I guess the difference come from these changes.

If you want to replicate the original dataset, you can run the mp.py script as committed on 2021/09/05: https://github.com/bvanaken/clinical-outcome-prediction/commits/master/tasks/mp/mp.py

Best
Betty

Just in case anyone having the same issue of not getting any data subset by running the code on the latest version. I down version all packages in requirements.txt to before 2021/09 to make it work. (haven't checked the data proporsion though)

numpy==1.21.0 pandas==1.3.2 nltk==3.6.2