OpenAI o1-preview and o1-mini
Opened this issue · 3 comments
Very curious about OpenAI o1-preview and o1-mini MMLU-Pro scores. Opening this issue as a tracking issue that people can follow and updates can be shared in.
I headed to the issues page to post the same request, so instead I will cheer on this one :)
In case it may be helpful: as compared with other LLMs, o1-preview is (1) remarkably slow, (2) remarkably expensive and (3) remarkably different (and sometimes better) than any other LLM I've used. I'm extremely interested in finding out how it performs on MMLU-Pro.
Hmm... This tweet would seem to suggest that the researchers at TIGER have already evaluated o1-preview
last month:
https://x.com/WenhuChen/status/1834605218018754581
We would love to see this data reflected in the GitHub repo!
We were capped by the request/day before. But we can definitely rerun it now with higher quota.