Top 50 codes
Closed this issue · 2 comments
How were the top 50 codes chosen?
The top 50 list doesn't seem to agree with the ones chosen in https://github.com/jamesmullenbach/caml-mimic (Explainable Prediction of Medical Codes from Clinical Text https://arxiv.org/abs/1802.05695 )
and for example doesn't include probably the most common diagnosis on the planet: unspecified hypertension (4019).
Is it based on the primary diagnoses (first code)?
We subsetted to the top 50 most frequent primary diagnostic codes, as noted. From my reading of MIMIC-III, these codes (SEQ_NUM==1) are the most important with the remaining codes per HADM_ID appearing in a sequence that may or may not have anything to do with importance. You could try and train a linear ranker like we attempted in X-Transformer-ICD to account for this multi-label ranked case, though there are 2e49 permutations per patient assuming 12 codes and the 13,000 unique diag ICDs appearing in MIMIC-III. The classifier was mainly to show relative encoding performance on the different models.
Yeah those are the primary diagnoses. Thank you for the answer and good work!