/PubMedCLIP

Fine-tuning CLIP using ROCO dataset which contains image-caption pairs from PubMed articles.

Primary LanguagePythonMIT LicenseMIT

PubMedCLIP in Medical Visual Question Answering

This repository includes PubMedCLIP, the fine-tuned version of CLIP with ROCO image--caption pairs. We also provide the pipelines for encorporating PubMedCLIP as the alternative pre-trained visual encoder in MEVF and QCR medical visual question answering pipelines. Our experiments illustrate that PubMedCLIP results in up tp 3% improvement in the medical visual question answering.

Citation

If you use this work in academic publication, please cite the paper by Sedigheh Eslami, Christoph Meinel, and Gerard de Melo.

BibTeX entry:

@inproceedings{eslami2023pubmedclip,
  title={PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?},
  author={Eslami, Sedigheh and Meinel, Christoph and De Melo, Gerard},
  booktitle={Findings of the Association for Computational Linguistics: EACL 2023},
  pages={1151--1163},
  year={2023}
}