Top 50 codes

Question

Top 50 codes

Closed this issue 4 years ago · 2 comments

How were the top 50 codes chosen?

The top 50 list doesn't seem to agree with the ones chosen in https://github.com/jamesmullenbach/caml-mimic (Explainable Prediction of Medical Codes from Clinical Text https://arxiv.org/abs/1802.05695 )

and for example doesn't include probably the most common diagnosis on the planet: unspecified hypertension (4019).

Is it based on the primary diagnoses (first code)?

Answer 1 · 2020-12-17T00:47:43.000Z

We subsetted to the top 50 most frequent primary diagnostic codes, as noted. From my reading of MIMIC-III, these codes (SEQ_NUM==1) are the most important with the remaining codes per HADM_ID appearing in a sequence that may or may not have anything to do with importance. You could try and train a linear ranker like we attempted in X-Transformer-ICD to account for this multi-label ranked case, though there are 2e49 permutations per patient assuming 12 codes and the 13,000 unique diag ICDs appearing in MIMIC-III. The classifier was mainly to show relative encoding performance on the different models.

Answer 2 · 2020-12-17T00:49:26.000Z

Yeah those are the primary diagnoses. Thank you for the answer and good work!