Awesome Medical Vision Language Learning

Datasets

Dataset	Year	Modality	Images	Text
MIMIC-CXR[data][paper]	2019	Chest X-ray	377,110	227,827
CheXpert[data][paper]	2019	Chest X-ray	224,316	224,316
ROCO [data][paper]	2018	CT, Ultrasound, X-Ray, Fluoroscopy, PET, Mammography, MRI, Angiography, PET-CT	81,825	81,825
MedICaT[data][paper]	2020	CT, Ultrasound, X-Ray, Fluoroscopy, PET, Mammography, MRI, Angiography, PET-CT	217,060	217,060

VLP: A Survey on Vision-Language Pre-training. arxiv 2022. [paper]
Vision-Language Pre-training: Basics, Recent Advances, and Future Trends. arxiv 2022. [paper]
Beyond Medical Imaging: A Review of Multimodal Deep Learning in Radiology. techrxiv 2022. [paper]

2023

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents. arxiv 2023. [paper][code]
[BiomedCLIP] LARGE-SCALE DOMAIN-SPECIFIC PRETRAINING FOR BIOMEDICAL VISION-LANGUAGE PROCESSING. arxiv 2023. [paper][model]
Vision-Language Modelling for Radiological Imaging and Reports in the Low Data Regime. MIDL 2023. [paper]
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts. arxiv 2023. [paper][code]
[MRM] Advancing Radiograph Representation Learning with Masked Record Modeling. ICLR 2023. [paper][code]
[BioViL-T] Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing. CVPR 2023. [paper]
MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training. arxiv 2023. [paper] [code]

2022

[MGCA] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning. NIPS 2022. [paper][code]
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text. EMNLP 2022. [paper][code]
[M3AE] Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training. MICCAI 2022. [paper][code]
Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training. MICCAI 2022. [paper]
Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge. MM 2022. [paper][code]
[MedViLL] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training. JHBI 2022. [paper][code]
[REFERS] Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports. Nature Machine Intelligence 2022. [paper][code]
[BioViL] Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing. ECCV 2022. [paper]
[LoVT] Joint learning of localized representations from medical images and reports. ECCV 2022. [paper]

2021

[Local-MI] Multimodal Representation Learning via Maximization of Local Mutual Information. MICCAI 2021. [paper]
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition. ICCV 2021. [paper]
Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays. arxiv 2021. [paper]

2020

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports. BIBM 2020. [paper]
[ConVIRT] Contrastive Learning of Medical Visual Representations from Paired Images and Text. MLHC 2022. [paper][code]

2018

Unsupervised Multimodal Representation Learning across Medical Images and Reports. NIPS workshop 2018. [paper]

2023

Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study. ICLR 2023. [paper]

2022

Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains. NIPS workshop 2022. [paper]

2021

[PubMedCLIP] Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain. arxiv 2021. [paper][code]

LViT: Language meets Vision Transformer in Medical Image Segmentation. arxiv 2022. [paper][code]

RoentGen: Vision-Language Foundation Model for Chest X-ray Generation. arxiv 2022. [paper]