Medical Masked Autoencoders

Paper

This repository provides the official implementation of training Vision Transformers (ViT) for (2D) medical imaging tasks as well as the usage of the pre-trained ViTs in the following paper:

Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification
Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou
Johns Hopkins University
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023
paper | code

TO DO

Instructions for preparing datasets.
Instructions for pretraining and fine-tuning.

Image reconstruction demo

Installing Requirements

Our codebase follows the MAE Official and uses some additional packages. You may use one of the following commands to build environments with Conda and Pip.

Conda:

conda create -n medical_mae -f medical_mae.yml

Pip:

conda create -n medical_mae python=3.8
conda activate medical_mae
pip install -r requirements.txt

Preparing Datasets:

The MIMIC-CXR, CheXpert, and ChestX-ray14 datasets are public available on their official sites. You can download or request the access to them under the agreements.

You may also download them through the following links for research only and follow the official agreements.

MIMIC-CXR (JPG): https://physionet.org/content/mimic-cxr-jpg/2.0.0/

CheXpert (v1.0-small): https://www.kaggle.com/datasets/ashery/chexpert

ChestX-ray14 : https://www.kaggle.com/datasets/nih-chest-xrays/data

Pre-training on ImageNet or Chest X-rays

The pre-training instruction is in PRETRAIN.md.

Fine-tuning with pre-trained checkpoints

The fine-tuning instruction is in FINETUNE.md.

The following table provides the pre-trained checkpoints used in Table 1:

You can download all the weights in the following table with this link (google drive).

Model	Pretrained Dataset	Method	Pretrained	Finetuned (NIH Chest X-ray)	mAUC
DenseNet-121	ImageNet	Categorization	torchvision official	google drive	82.2
ResNet-50	ImageNet	MoCo v2	google drive	google drive	80.9
ResNet-50	ImageNet	BYOL	google drive	google drive	81.0
ResNet-50	ImageNet	SwAV	google drive	google drive	81.5
DenseNet-121	X-rays (0.3M)	MoCo v2	google drive	google drive	80.6
DenseNet 121	X-rays (0.3M)	MAE	google drive	google drive	81.2
ViT-Small/16	ImageNet	Categorization	DeiT Official	google drive	79.6
ViT-Small/16	ImageNet	MAE	google drive	google drive	78.6
ViT-Small/16	X-rays (0.3M)	MAE	google drive	google drive	82.3
ViT-Base/16	X-rays (0.5M)	MAE	google drive	google drive	83.0

Model	Pretrained Dataset	Finetuned (Chest X-ray)	mAUC	Finetuned (CheXpert)	mAUC	Finetuned (COVIDx)	Accuracy
ViT-Small/16	X-rays (0.3M)	google drive	82.3	google drive	89.2	google drive	95.2
ViT-Base/16	X-rays (0.5M)	google drive	83.0	google drive	89.3	google drive	95.3

Citation

If you use this code or use our pre-trained weights for your research, please cite our papers:

@inproceedings{xiao2023delving,
  title={Delving into masked autoencoders for multi-label thorax disease classification},
  author={Xiao, Junfei and Bai, Yutong and Yuille, Alan and Zhou, Zongwei},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={3588--3600},
  year={2023}
}

License

This repo is under Apache 2.0 license.

Acknowledgement

This work was supported by the Lustgarten Foundation for Pancreatic Cancer Research.

Our code is built upon facebookresearch/mae.

lambert-x/medical_mae