Mortality prediction: data preparation does not remove all death indications
Closed this issue · 3 comments
Hello,
First of all, thank you for publishing and creating these benchmark tasks. I am excited to see more research focused on clinical NLP, and these tasks will certainly help. I am in the process of using the Mortality Prediction task in some of our work, and I followed the instructions for data preparation using simulated admission notes. I noticed that descriptions of patient expired
are not captured by the text preparation code. Here's a code snippet to find these cases:
import pandas as pd
mp_train = pd.read_csv("path/to/MP_IN_adm_train.csv")
expired = mp_train[mp_train['text'].str.lower().str.contains("(?:patient|pt)\s+expired", regex=True)]
expired.shape[0] # 56
It is a small population, but we discovered these cases when a bag-of-words model highly prioritized the stemmed unigram expir
- when we explored further, we saw that these descriptions seem to appear most often within either the PHYSICAL EXAM
or PRESENT ILLNESS
section. The latter cases seem to have additional descriptors of end-of-life care occurring within the ICU stay - for example:
In concert with her family and the patient, it was decided to withdraw care at this point, and not pursue further aggressive medical measures. Patient was changed in code status to comfort measures only. She was started on Morphine drip. The patient expired on...
Thanks,
Mitch
Dear Mitch,
happy to hear that you are working with our benchmark tasks and thanks a lot for pointing us to these cases!
I will update the preparation code for the mortality prediction task to exclude the cases you've found.
Looking at the notes it probably makes sense to just remove the mention when patient expired
appears in PHYSICAL EXAM
and to fully exclude the case when we find it in any other section.
I have only found 5 cases in the test set, so I expect it will not significantly affect the scores.
I'll update you when the change is merged.
Thanks again!
Betty
I have now updated the code to exclude mentions of patient expired
and some further mortality indications. You can find the changes in the mp.py file.
Thanks again for reaching out and I hope the data is useful to you!
Best, Betty
Thank you, Betty!
Take care,
Mitch