chenzcv7/MOTOR

how can I obtain the "mimic_train_kg_AO.json" file?

Opened this issue · 2 comments

Very cool job! this file includes "label_index", what does it mean?

Looking forward to your reply. Thanks!

Thanks for your interest in our work! Due to the copyright issue, we cannot upload the file related to mimic-cxr. However, you can follow the step below to preprocess the original dataset of mimic-cxr:

  1. obtain the label_index:
    (1) Visit https://physionet.org/content/mimic-cxr-jpg/2.0.0/ to download the file mimic-cxr-2.0.0-chexpert.csv.gz which contains the label information for each data pair.
    (2) Combine the original json file with the label file mentioned above to add the label information to the original file.
    (3) Convert the label information to a 0-1 array, i.e., If the item belongs to the label, we set to 1; otherwise, we set to 0.

2 obtain knowledge triplet
(1) Use the Stanza to extract named entities from the medical reports in each data pair.
(2) Use RadGraph to obtain the related triplets with each entity.
(3) Store all the knowledge triplets (in the format of entity-relation-entity) in a list, and add the list as an item named "triplet" to the dataset.

Hope that this answer is helpful to you!

Stanza

Hi,

Could you please provide the code for preprocessing? According to the text, it is somewhat difficult to follow the specific details of the json file, especially the original dataset does not have a json file.