Dataset | Year | Modality | Images | Text |
---|---|---|---|---|
MIMIC-CXR[data][paper] | 2019 | Chest X-ray | 377,110 | 227,827 |
CheXpert[data][paper] | 2019 | Chest X-ray | 224,316 | 224,316 |
ROCO [data][paper] | 2018 | CT, Ultrasound, X-Ray, Fluoroscopy, PET, Mammography, MRI, Angiography, PET-CT |
81,825 | 81,825 |
MedICaT[data][paper] | 2020 | CT, Ultrasound, X-Ray, Fluoroscopy, PET, Mammography, MRI, Angiography, PET-CT |
217,060 | 217,060 |
-
VLP: A Survey on Vision-Language Pre-training. arxiv 2022. [paper]
-
Vision-Language Pre-training: Basics, Recent Advances, and Future Trends. arxiv 2022. [paper]
-
Beyond Medical Imaging: A Review of Multimodal Deep Learning in Radiology. techrxiv 2022. [paper]
-
Vision-Language Pretraining: Current Trends and the Future. ACL 2022. [link]
-
Recent Advances in Vision-and-Language Pre-training. CVPR 2022. [link]
Text Encoder | Year | Corpus |
---|---|---|
BioBERT | 2020 | PubMed |
ClinicalBERT | 2019 | MIMIC-III |
PubMedBERT | 2022 | PubMed |
CXR-BERT | 2022 | PubMed+MIMIC-III/CXR |
2023
-
PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents. arxiv 2023. [paper][code]
-
[BiomedCLIP] LARGE-SCALE DOMAIN-SPECIFIC PRETRAINING FOR BIOMEDICAL VISION-LANGUAGE PROCESSING. arxiv 2023. [paper][model]
-
Vision-Language Modelling for Radiological Imaging and Reports in the Low Data Regime. MIDL 2023. [paper]
-
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts. arxiv 2023. [paper][code]
-
[MRM] Advancing Radiograph Representation Learning with Masked Record Modeling. ICLR 2023. [paper][code]
-
[BioViL-T] Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing. CVPR 2023. [paper]
-
MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training. arxiv 2023. [paper] [code]
2022
-
[MGCA] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning. NIPS 2022. [paper][code]
-
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text. EMNLP 2022. [paper][code]
-
[M3AE] Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training. MICCAI 2022. [paper][code]
-
Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training. MICCAI 2022. [paper]
-
Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge. MM 2022. [paper][code]
-
[MedViLL] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training. JHBI 2022. [paper][code]
-
[REFERS] Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports. Nature Machine Intelligence 2022. [paper][code]
-
[BioViL] Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing. ECCV 2022. [paper]
-
[LoVT] Joint learning of localized representations from medical images and reports. ECCV 2022. [paper]
2021
-
[Local-MI] Multimodal Representation Learning via Maximization of Local Mutual Information. MICCAI 2021. [paper]
-
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition. ICCV 2021. [paper]
-
Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays. arxiv 2021. [paper]
2020
-
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports. BIBM 2020. [paper]
-
[ConVIRT] Contrastive Learning of Medical Visual Representations from Paired Images and Text. MLHC 2022. [paper][code]
2018
- Unsupervised Multimodal Representation Learning across Medical Images and Reports. NIPS workshop 2018. [paper]
2023
- Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study. ICLR 2023. [paper]
2022
- Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains. NIPS workshop 2022. [paper]
2021
- [PubMedCLIP] Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain. arxiv 2021. [paper][code]
Refer to Awesome-Multimodal-Applications-In-Medical-Imaging for more papers
- RoentGen: Vision-Language Foundation Model for Chest X-ray Generation. arxiv 2022. [paper]