TIGER-AI-Lab/MMLU-Pro

OpenAI o1-preview and o1-mini

Opened this issue · 3 comments

Very curious about OpenAI o1-preview and o1-mini MMLU-Pro scores. Opening this issue as a tracking issue that people can follow and updates can be shared in.

I headed to the issues page to post the same request, so instead I will cheer on this one :)

In case it may be helpful: as compared with other LLMs, o1-preview is (1) remarkably slow, (2) remarkably expensive and (3) remarkably different (and sometimes better) than any other LLM I've used. I'm extremely interested in finding out how it performs on MMLU-Pro.

Hmm... This tweet would seem to suggest that the researchers at TIGER have already evaluated o1-preview last month:
https://x.com/WenhuChen/status/1834605218018754581

We would love to see this data reflected in the GitHub repo!

We were capped by the request/day before. But we can definitely rerun it now with higher quota.