MMMU-Benchmark/MMMU

There's an error in one of the Correct Examples for genetics

eabase opened this issue · 3 comments

Looking through the Correct Examples I came across the genetics example, and it is wrong.

2024-0427_173712_MMMU_—_Mozilla_Firefox

  • The numbers doesn't correspond to what's in the image.
  • The logic/reasoning shown is regardless incorrect even if numbers had been ok.

It would be interesting to know how you guys are wetting what is considered to be correct?

Thank you for bringing this to our attention!
The example you mentioned is not a good case, and we will remove it. However, it will still be counted as a correct answer based on our extraction logic.

@NipElement

However, it will still be counted as a correct answer based on our extraction logic.

Sorry, I don't understand.
Why would you "count" something that is wrong, as being correct?

Hi All, it is a parsing error in our answer extraction code part. It will introduce about 1% errors for some models.

To avoid such issues, people who are interested in evaluating their own models can extract the response themselves and submit the extracted answer directly, instead of the raw response.

In this case, such parsing errors will not be introduced.