ttanida/rgrg

Some questions about the MIMIC-CXR-JPG

Closed this issue · 16 comments

Liqq1 commented

👋 Hi, thanks for your code, and I have some questions about the MIMIC-CXR-JPG to confirm.

Is your MIMIC-CXR-JPG downloaded from here (https://physionet.org/content/mimic-cxr-jpg/2.0.0/) ——JGP of original size(557.6 GB)?

Or is it contains 224x224 image? If I use the 224x224 images, do the bbox annotations need to change with the size change?

Thank you in advance. Looking forward to hearing from u☺️

Hello there! Thank you for your question.

Yes, we used the JPG images of MIMIC-CXR-JPG from the link you provided. As you mentioned, the dataset is 557.6 GB and all the images are of size 2544 × 3056 pixels (as far as I've seen).

Image Transformations

Regarding the image transformations, we used a library called Albumentations to apply the following transformations:

  • LongestMaxSize: This transformation resizes the input image (of size 2544 × 3056) such that the long edge of the image (i.e. the height of 3056) is resized to IMAGE_INPUT_SIZE (we chose this to be 512), while the shorter edge (i.e. the width of 2544) is resized accordingly to maintain the original aspect ratio.
  • Data Augmentation Transformations: We applied several data augmentation transformations (e.g., ColorJitter, GaussNoise, Affine) to the images.
  • PadIfNeeded: We applied this transformation to make sure that all images are of size 512 x 512. This transformation pads both sides of the shorter edge (i.e. the width) to 512 with black pixels.

You can find the train and val transformations that we applied to the images in the function get_transforms in the module train_full_model.py (see line 340) and the module training_script_object_detector.py (see line 479), respectively. The same transformations are applied to the images fed into the object detector / full model.

Bounding Box Annotations

Regarding your question about the bounding box annotations, yes, they need to change in size according to the transformations applied to the image.

Fortunately, Albumentations automates the corresponding transformations for the bounding boxes. To use this functionality, you need to specify the argument bbox_params=A.BboxParams(format="pascal_voc", label_fields=['class_labels']) in A.compose (see line 366 of train_full_model.py).

When you apply your image transformation in the dataset class, you can input your bounding box coordinates and labels together with your image (see line 43 of custom_dataset.py). Albumentations ensures that everything is transformed correctly.

Please note that the bounding box annotations should be provided in the format specified by Albumentations. In our implementation, we used the Pascal VOC format, which requires the bounding box coordinates to be in the format [xmin, ymin, xmax, ymax].

For more information on bounding box augmentation with Albumentations, please see https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/.

I hope this helps! Let me know if you have any further questions.

Liqq1 commented

Oh! Thank you very much for your timely and detailed reply! 💕☺️

I now understand how it's handled.
When constructing csv_files, we need to use 557.6 GB original images. Then, during training/infering, we will resize the original images to 512x512 and process their bbox at the same. (I hope I understand correctly)

This seems to be a really big project, thanks again for your open code and patient reply, I still need time to learn and run it!

Absolutely, you've got it right! Each row in the csv files contains the properties of a single image, and one of these properties is the path to the JPG file of the image in the MIMIC-CXR-JPG dataset folder.

As you correctly mentioned, the images are read-in (see line 40 of custom_dataset.py) and the transformations (resizing, data augmentation, padding) of the images (and corresponding bboxes) are applied during training/inference time. Among other reasons, this is done since each training image transformation is random due to the data augmentation techniques (so it can’t be done beforehand).

Feel free to reach out if you have any more questions or need further clarification!

Liqq1 commented

Sorry to bother you again, but I want to confirm how many .json files there are in your scene_graph?

I downloaded scene_graph.zip here(https://physionet.org/content/chest-imagenome/1.0.0/) and unzipped it. My scene_graph file contains 243,310.json files.

But when I run create_dataset.py, I get a reminder that I can't find some files, for example:

FileNotFoundError: [Errno 2] No such file or directory: '…/silver_dataset/scene_graph/943486a3-b3fa9ff7-50f5a769-7a62fcbb-f39b6da4_SceneGraph.json'

I open the scene_graph file to look for these files, but there was no such file. I want to know whether the file I downloaded was wrong or there was a problem in decompression (I tried twice, but the result was the same, and no error was reported during decompression).

I also have 243,310 JSON files in the directory chest-imagenome-dataset-1.0.0/silver_dataset/scene_graph

However, I did find the 943486a3-b3fa9ff7-50f5a769-7a62fcbb-f39b6da4_SceneGraph.json in the scene_graph directory, so I'm really not sure why it's missing in your case.

Liqq1 commented

Sorry, it was my negligence that caused the above problem, now I can run create_dataset.py, thank you for your help.

Also, for lines 27-28 in full/report_generation_model.py, I see that you commented them out, and I wonder why? In the training of full model, shouldn't FastRCNN be frozen after loading parameters?

# path_to_best_object_detector_weights = "/u/home/tanida/runs/object_detector/run_10/weights/val_loss_13.482_epoch_6.pth"
# self.object_detector.load_state_dict(torch.load(path_to_best_object_detector_weights))

No worries! Let me help clarify the training process and the role of the commented lines in report_generation_model.py.

Training Stages Overview (see this README file for more details)

We train the full model, which consists of 4 main modules (object detector, 2 binary classifiers, language model), in 3 stages:

  1. Train the object detector.
  2. Train the object detector together with the binary classifiers.
  3. Train the full model end-to-end.

This approach ensures that the language model receives valid input from the preceding modules and can be effectively trained.

Loading Weights

  • Stage 1: The object detector is trained with training_script_object_detector.py
  • Stage 2: To train the object detector and binary classifiers together, uncomment lines 27-28 in report_generation_model.py and add the path to the desired weights of the object detector. Set PRETRAIN_WITHOUT_LM_MODEL = True in run_configurations.py, such that only the first 3 modules (without the language model) are trained. Run python train_full_model.py.
  • Stage 3: To train the full model end-to-end, comment out lines 27-28 in report_generation_model.py and load the weights for the trained object detector + 2 binary classifiers model (from the previous stage) in line 567 of train_full_model.py. Set PRETRAIN_WITHOUT_LM_MODEL = False in run_configurations.py, such that the full model is trained. Run python train_full_model.py.

Note: Commenting out the object detector weights in lines 27-28 in training stage 3 is not strictly necessary since they will be overwritten by the weights loaded in at line 567.

So in summary, the lines 27-28 were commented out because I was probably training the full model the last time I committed that file to GitHub. However, the lines should be uncommented if the weights of the object detector are actually needed (i.e. for the 2nd training stage). I do agree that this part of the code is not that intuitive, and will look into changing it to make it more user-friendly.

Frozen/Unfrozen parameters

We kept all parameters of each module unfrozen during all training stages. The only exception is the language model, for which we used a technique called pseudo self-attention to train it. In this technique, all parameters of the language model are frozen, except for newly initialized key and value projection matrices (see matrices U_k and U_v in the equation on page 4 of the linked paper). These matrices are used to directly inject the region-level image embeddings (on which the language model is conditioned) into the language model's self-attention module.

Upcoming Paper

We plan to upload our paper to arXiv and paperswithcode soon (possibly by the end of this week), which should help clarify the method/code further. I will let you know when the paper is uploaded!

I hope this clears things up for you. If you have any more questions or need further clarification, please don't hesitate to ask.

Liqq1 commented

Very detailed instructions, which helped me a lot! (I also find some explain in run_configurations.py)
Can it be summarized as follows? where:

  • Stage 2:
    uncomment lines 27-28 in report_generation_model.py
    comment out lines 567 in train_full_model.py
    PRETRAIN_WITHOUT_LM_MODEL = True

  • Stage 3:
    comment out lines 27-28 in report_generation_model.py
    uncomment lines 567 in train_full_model.py
    PRETRAIN_WITHOUT_LM_MODEL = False

Could you provide me with the ‘best_object_detector_weights’? I want to see how the object_detector works.

Looking forward to your paper! I think it's great of you to share the perfect, reproducible code, and thanks a lot for your contributions.

Your summarization is exactly right.

Regarding the 'best_object_detector_weights', I'm happy to share them with you. I'll upload them over the weekend as I'm a bit busy this week.

Thank you for your interest in our work and for your kind words. I appreciate it!

Paper is uploaded now: https://arxiv.org/abs/2304.08295 🙂

Liqq1 commented

got it! thank you~😊 ☺️

Hi, @Liqq1 , have you successfully downloaded the original dataset at ([https://physionet.org/content/mimic-cxr-jpg/2.0.0/])?

Liqq1 commented

Hi, @Liqq1 , have you successfully downloaded the original dataset at ([https://physionet.org/content/mimic-cxr-jpg/2.0.0/])?

Hi
I'm still downloading. That's a lot of data. I think it'll take more than ten days.

Hi, @Liqq1 , have you successfully downloaded the original dataset at ([https://physionet.org/content/mimic-cxr-jpg/2.0.0/])?

Hi I'm still downloading. That's a lot of data. I think it'll take more than ten days.

Thanks, but it seems that Fudan University is not listed as a valid institution for the CITI Course which is a prerequisite for data application and neither is my school. How could you pass all the requirements for data applications? Please give me some instructions. Thanks.

Btw, I am a big fan of your blogs, May I have your WeChat? (I could send an email to your Gmail).

Liqq1 commented

No problem, you can tell me your wechat in the email