mtbench101/mt-bench-101
[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
Apache-2.0
Issues
- 0
Data generation
#10 opened - 4
Question about the evaluation code.
#9 opened - 3
[Bug] When evaluation, {prediction} in origin_prompt is not replaced with model's response?
#8 opened - 2
OpenCompass 实现提示词格式对人不友好
#7 opened - 0
- 1
Call for code and data!
#3 opened - 1
论文都发表了,写在论文里,github仓库为空
#2 opened