This repository contains code for the paper `An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics'.
Recently, reference-free metrics such as CLIPScore (Hessel et al., 2021) and UMIC (Lee et al., 2021) have been proposed for automatic evaluation of image captions, demonstrating a high correlation with human judgment. We provide insights into the strengths and limitations of reference-free metrics for image captioning evaluation, guiding future improvements in this area.
-
Dataset: Download the dataset from here. Additionally, we have provided the file containing scores for all baselines for each metric.
-
Code: The code for our study will be released soon.