Fine-tuning CLIP using ROCO dataset which contains image-caption pairs from PubMed articles.
Primary LanguagePythonMIT LicenseMIT