open-compass/T-Eval
[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
PythonApache-2.0
Issues
- 2
- 2
Can not eval when set batch_size>1
#48 opened by dkqkxx - 1
When do you want to support internlm2
#43 opened by seanxuu - 2
请问能否提供一份完全对齐openai输入格式的测试数据
#49 opened by Watebear - 1
Llama2 7b chat 模型,输入长度超过 4096
#44 opened by Watebear - 1
Review测评指标失真,Qwen被严重低估了
#53 opened by fengzhu1 - 2
对数据集case数的疑问
#50 opened by AmberXu98 - 0
是否支持qwen1.5,复现结果差距较大
#51 opened by Little-girl-1992 - 0
Questions about T-Eval
#45 opened by Cppowboy - 0
How to use multi-gpu to test?
#42 opened by seanxuu - 0
qwen14B测试python test.py 报错
#40 opened by chococatsrin - 0
qwen1.5 tokenizer错误
#39 opened by chococatsrin - 0
qwen-14b评测结果疑问
#38 opened by Fenglly - 3
- 0
Evaluate Claude 3
#37 opened by stalkermustang - 0
关于plan_json_v1_zh.json数据文件答案问题
#36 opened by 13416157913 - 0
关于plan_json_v1_zh.json数据文件答案问题
#35 opened by 13416157913 - 3
API model ERROR
#31 opened by HC-Guo - 3
【BUG】RuntimeError: The size of tensor a (8192) must match the size of tensor b (8193) at non-singleton dimension 3
#30 opened by Ayooooo - 5
BUG: stop_words
#33 opened by ZHUANGMINGXI - 0
BUG: stop_words
#32 opened by ZHUANGMINGXI - 2
大家好,有个T-Eval评测数据集的疑惑,希望各位帮忙解答一下,感谢。
#29 opened by 13416157913 - 13
- 1
- 5
Tool Set的问题
#27 opened by yitianlian - 2
有关数据开源的问题
#26 opened by pengming617 - 2
- 1
- 1
请问bench里面有关于测试大语言模型翻译能力的吗?具体是哪一项
#16 opened by White-Friday - 1
您好,请问中文数据集测试一轮大概花多长时间?
#18 opened by 13416157913 - 4
请问plan和instruct的区别?
#17 opened by milk-bottle-liyu - 1
关于review metrics
#6 opened by DryPilgrim - 1
关于dataset statistics & tool generation
#5 opened by DryPilgrim - 1
关于六项能力之间的关系
#3 opened by Emperorizzis - 1
如何根据测试结果文件确定论文table1中的各项指标?是f1 score吗
#7 opened by DryPilgrim - 4
对评测速度和结果的疑问
#21 opened by klykq111 - 1
T-Eval加入open-compass框架
#20 opened by merlinarer - 9
vllm兼容性问题
#11 opened by Double-bear - 1
QWen测试message格式问题
#14 opened by gewenbin0992 - 2
模型推理格式相关提问
#4 opened by Double-bear - 2
关于test_num
#2 opened by DryPilgrim - 3
How to submit model results to T-Eval?
#1 opened by magicsongyang