bvanaken/clinical-outcome-prediction

Mortality prediction: data preparation does not remove all death indications

Closed this issue · 3 comments

Hello,

First of all, thank you for publishing and creating these benchmark tasks. I am excited to see more research focused on clinical NLP, and these tasks will certainly help. I am in the process of using the Mortality Prediction task in some of our work, and I followed the instructions for data preparation using simulated admission notes. I noticed that descriptions of patient expired are not captured by the text preparation code. Here's a code snippet to find these cases:

import pandas as pd
mp_train = pd.read_csv("path/to/MP_IN_adm_train.csv")
expired = mp_train[mp_train['text'].str.lower().str.contains("(?:patient|pt)\s+expired", regex=True)]
expired.shape[0] # 56

It is a small population, but we discovered these cases when a bag-of-words model highly prioritized the stemmed unigram expir - when we explored further, we saw that these descriptions seem to appear most often within either the PHYSICAL EXAM or PRESENT ILLNESS section. The latter cases seem to have additional descriptors of end-of-life care occurring within the ICU stay - for example:

In concert with her family and the patient, it was decided to withdraw care at this point, and not pursue further aggressive medical measures. Patient was changed in code status to comfort measures only. She was started on Morphine drip. The patient expired on...

Thanks,
Mitch

Dear Mitch,

happy to hear that you are working with our benchmark tasks and thanks a lot for pointing us to these cases!
I will update the preparation code for the mortality prediction task to exclude the cases you've found.
Looking at the notes it probably makes sense to just remove the mention when patient expired appears in PHYSICAL EXAM and to fully exclude the case when we find it in any other section.

I have only found 5 cases in the test set, so I expect it will not significantly affect the scores.
I'll update you when the change is merged.

Thanks again!
Betty

I have now updated the code to exclude mentions of patient expired and some further mortality indications. You can find the changes in the mp.py file.

Thanks again for reaching out and I hope the data is useful to you!
Best, Betty

Thank you, Betty!

Take care,
Mitch