Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization (BMVC 2024 Oral ✨)
Primary LanguagePython