Add Claude 3.5 Haiku

Question

Closed this issue 18 days ago · 2 comments

Anthropic just released Claude 3.5 Haiku, I’m very curious how they score!

They claim 65.0% overall.

Answer 1 · 2024-11-16T05:41:29.000Z

Thanks, we have added its evaluation result to our leaderboard and the model output into our git repo.

Answer 2 · 2024-11-16T07:32:57.000Z

Thanks!

it’s great that you guys do independent verification, because your measured
62.1% is noticeably lower than the 65.0% that Anthropic claimed.