broadinstitute/ml4h

look for column with MRN in sample_csv, or let user specify

Closed this issue · 0 comments

erikr commented

What
_sample_csv_to_set should try to find the mrn column instead of assume the MRN is in row[0].

Why
https://github.com/broadinstitute/ml/blob/b18e4bc2eb60471b8e8e560341ec384987da66b9/ml4cvd/tensor_generators.py#L507-L517

assumes the MRN is in the first column. This is not always true.

How
First, check if user specifies the column name. However, args is getting bloated.

Second, check if CSV has a header, and if it it does, look for common header names such as mrn, medrecnum, patient_id, etc.

Third, can assume MRN is column 0.

Acceptance Criteria
_sample_csv_to_set looks for an MRN column before assuming MRNs are in row[0].