Input window for time-series data
ChantalMP opened this issue · 4 comments
Hi,
Thanks for your exciting work!
I was wondering if all data throughout the patient's stay is used to form the patient embedding.
Especially for Mortality and Discharge prediction, the paper mentions the labels are defined relative to patient admission. Does this mean no time-series data is used as it does not yet exist for the patient? Or is the entire time-series data used? If the complete data is used, wouldn't the length of the time-series records alone have a strong correlation to the final output label?
Thanks a lot in advance,
Chantal
Hi Chantal,
Thanks for the kind words. All the data before the prediction time is used from each patient embedding generation. This includes the time series. If you see how we process time series, we end up using time series statistics as the time series features; this means the time series trends for all signals are more important than their length per-se. All that said, definitely, time series length will correlate with mortality as it is more likely that a complicated patient that stays for a long time in the hospital dies than one who has just been admitted. We did not conduct sensitivity analyses on time series length and mortality class, but I would suspect that length of stay and mortality are correlated and for a good reason. Decoupling them could actually not be advantageous if you think about it carefully. Thanks for your thoughtful note and hope this helps.
Thanks for you quick and helpful reply.
Just one clarification question: What do you define as the prediction time? So e.g. for 48h-discharge prediction, is the prediction time at the beginning or end of these 48 hours?
My thought was less about understanding a patient is complicated due to long stay time, that makes total sense. Rather, when the prediction time is always the time at admission, patient with a record longer than 48 hours can not be discharged or die within the first 48 hours of their stay, while patients with a record shorter than 48 hours probably either died or were released.
Thanks again!
We define prediction time at the beginning of these 48 hours. As in, we make the prediction say at t = N, whereas the binary mortality label is obtained just by looking at the patient's state at t = N + 48hrs. I believe for that specific task, if a patient was discharged "alive" less than 48hrs after admission, then we labeled those as alive at 48hrs after the prediction time, unless they had a subsequent new hospitalization within 48hrs from that previous event and died within 48hrs from the admission time of the first hospitalization. Hope this helps.
Thanks a lot :)