TIGER-AI-Lab/MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
PythonApache-2.0
Issues
- 0
Add Tencent Hunyuan-Large
#43 opened - 0
Add Claude 3.5 Haiku
#42 opened - 0
- 0
- 1
Which DeepSeek-Coder-V2?
#39 opened - 1
Add SmolLM2 1.7B
#38 opened - 1
New model | Cohere Aya Expanse
#37 opened - 1
New model | Yi - Lightning
#36 opened - 2
Add Mistral Small v24.09
#35 opened - 1
Add Ministral 3B and 8B
#34 opened - 1
What is the Arx-0.3 model?
#31 opened - 2
Llama-3.1-nemotron-70b-instruct
#30 opened - 0
- 1
regarding leaderboard submission
#26 opened - 2
Add Gemini-1.5-Flash-002 and -Pro-002
#25 opened - 6
- 2
- 4
Add Qwen2.5 model family
#22 opened - 3
OpenAI o1-preview and o1-mini
#21 opened - 1
- 2
- 1
Questionable questions
#16 opened - 1
- 1
Variable length of "options"?
#14 opened - 1
Possible to remove spam model result
#13 opened - 1
Add Grok-2?
#12 opened - 1
Support for standard deviation
#11 opened - 5
Request for Llama3.1 8B, 70B and 405B
#10 opened - 1
- 11
- 1
Duplicates in test split
#6 opened - 6
- 4
Add Gemma 2 9B and 27B
#4 opened - 1