/Awesome-Medical-Vision-Language-Learning

Papers and Public Datasets for Medical Vision-Language Learning

Awesome Medical Vision Language Learning

Contents

Datasets

Dataset Year Modality Images Text
MIMIC-CXR[data][paper] 2019 Chest X-ray 377,110 227,827
CheXpert[data][paper] 2019 Chest X-ray 224,316 224,316
ROCO [data][paper] 2018 CT, Ultrasound, X-Ray, Fluoroscopy, PET,
Mammography, MRI, Angiography, PET-CT
81,825 81,825
MedICaT[data][paper] 2020 CT, Ultrasound, X-Ray, Fluoroscopy, PET,
Mammography, MRI, Angiography, PET-CT
217,060 217,060

Survey

  • VLP: A Survey on Vision-Language Pre-training. arxiv 2022. [paper]

  • Vision-Language Pre-training: Basics, Recent Advances, and Future Trends. arxiv 2022. [paper]

  • Beyond Medical Imaging: A Review of Multimodal Deep Learning in Radiology. techrxiv 2022. [paper]

Tutorial

  • Vision-Language Pretraining: Current Trends and the Future. ACL 2022. [link]

  • Recent Advances in Vision-and-Language Pre-training. CVPR 2022. [link]

Vision Language Pretraining

Text Encoder

Text Encoder Year Corpus
BioBERT 2020 PubMed
ClinicalBERT 2019 MIMIC-III
PubMedBERT 2022 PubMed
CXR-BERT 2022 PubMed+MIMIC-III/CXR

How to Train

2023

  • PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents. arxiv 2023. [paper][code]

  • [BiomedCLIP] LARGE-SCALE DOMAIN-SPECIFIC PRETRAINING FOR BIOMEDICAL VISION-LANGUAGE PROCESSING. arxiv 2023. [paper][model]

  • Vision-Language Modelling for Radiological Imaging and Reports in the Low Data Regime. MIDL 2023. [paper]

  • Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts. arxiv 2023. [paper][code]

  • [MRM] Advancing Radiograph Representation Learning with Masked Record Modeling. ICLR 2023. [paper][code]

  • [BioViL-T] Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing. CVPR 2023. [paper]

  • MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training. arxiv 2023. [paper] [code]

2022

  • [MGCA] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning. NIPS 2022. [paper][code]

  • MedCLIP: Contrastive Learning from Unpaired Medical Images and Text. EMNLP 2022. [paper][code]

  • [M3AE] Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training. MICCAI 2022. [paper][code]

  • Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training. MICCAI 2022. [paper]

  • Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge. MM 2022. [paper][code]

  • [MedViLL] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training. JHBI 2022. [paper][code]

  • [REFERS] Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports. Nature Machine Intelligence 2022. [paper][code]

  • [BioViL] Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing. ECCV 2022. [paper]

  • [LoVT] Joint learning of localized representations from medical images and reports. ECCV 2022. [paper]

2021

  • [Local-MI] Multimodal Representation Learning via Maximization of Local Mutual Information. MICCAI 2021. [paper]

  • GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition. ICCV 2021. [paper]

  • Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays. arxiv 2021. [paper]

2020

  • A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports. BIBM 2020. [paper]

  • [ConVIRT] Contrastive Learning of Medical Visual Representations from Paired Images and Text. MLHC 2022. [paper][code]

2018

  • Unsupervised Multimodal Representation Learning across Medical Images and Reports. NIPS workshop 2018. [paper]

How to Use

2023

  • Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study. ICLR 2023. [paper]

2022

  • Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains. NIPS workshop 2022. [paper]

2021

  • [PubMedCLIP] Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain. arxiv 2021. [paper][code]

Vision Language Task

Refer to Awesome-Multimodal-Applications-In-Medical-Imaging for more papers

Segmentation

  • LViT: Language meets Vision Transformer in Medical Image Segmentation. arxiv 2022. [paper][code]

Generation

  • RoentGen: Vision-Language Foundation Model for Chest X-ray Generation. arxiv 2022. [paper]