MMMU-Benchmark/MMMU

Mismatch of the data label in Eval code

XiongweiWu opened this issue · 1 comments

Hi, I am checking your dataset, and the label of "validation_Literature_10" is different from the one in your eval code:

"ground_truth": "D"

where the GT is "D".

However In the dataset, this data point is:
{'id': 'validation_Literature_10', 'question': 'Refer to the description <image 1>, which term refers to the choices a writer makes, including the combination of distinctive features such as mood, voice, or word choice?', 'options': "['Theme', 'Setting', 'Style', 'Tone']", 'explanation': '', 'image_1': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=340x291 at 0x7FFE9C906620>, 'image_2': None, 'image_3': None, 'image_4': None, 'image_5': None, 'image_6': None, 'image_7': None, 'img_type': "['Icons and Symbols']", 'answer': 'C', 'topic_difficulty': 'Easy', 'question_type': 'multiple-choice', 'subfield': 'Comparative Literature'}
where the GT is "C", however.

Can u check it?

The answer in the dataset is correct. Sorry we gave the wrong answer in the example, now we have fixed it! Thank you for pointing this out.