TIGER-AI-Lab/MMLU-Pro

Add Claude 3.5 Haiku

Closed this issue · 2 comments

Anthropic just released Claude 3.5 Haiku, I’m very curious how they score!

They claim 65.0% overall.

Wyyyb commented

Thanks, we have added its evaluation result to our leaderboard and the model output into our git repo.

Thanks!

it’s great that you guys do independent verification, because your measured
62.1% is noticeably lower than the 65.0% that Anthropic claimed.