mtbench101/mt-bench-101

[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

Apache-2.0

Issues

Data generation
#10 opened in 3 hours
0
Question about the evaluation code.
#9 opened 23 days ago
4
[Bug] When evaluation, {prediction} in origin_prompt is not replaced with model's response?
#8 opened 3 months ago
3
OpenCompass 实现提示词格式对人不友好
#7 opened 3 months ago
2
Introducing the MT-Bench-101 Beta Version!
#4 opened 3 months ago
0
Call for code and data!
#3 opened 3 months ago
1
论文都发表了，写在论文里，github仓库为空
#2 opened 3 months ago
1