Using self-supervised pretraining to reduce the need for labeled data for medical object detection

This is code from the paper TODO.

Paper link: TODO

BibTex: TODO


  • Python 3.8
  • PyTorch 1.10
  • MMDetection 2.20
  • Lightly SSL 1.2
  • Check environment.yml for more packages.

Data used

Ha Q. Nguyen et al. “VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations” – A preprint is available on ArXiv


source/ contains all the code. source/pretraining contains the code for self-supervised pretraining. Use and for training and pre-training, and to test on a test dataset.

To prepare the data, download the VinBigData dataset from here:

This is a 512x512 .png version of the original VinDr-CXR dataset.

Store it in a folder named vinbigdata at the root of the repository. Then run source/ to convert the dataset and store it into source/data.

Check source/ for details on how to run pre-training and training. If you want to pre-train the models, it's important that you use the same experiment name for both pre-training and fine-tuning. Pre-training stores the backbone checkpoint in vinbig_output/<experiment-name>, which is then loaded before fine-tuning begins in