Issues
- 2
[Feature] Add a LICENSE to the project
#141 opened by cjoverbay - 1
Would llama3 wizardlm2 and other latest models be tested and published in leaderboard? 请求添加llama3 wizardlm等24年4-5月大模型的测试结果
#136 opened by dercaft - 1
[Feature] 请问每个任务的分是怎么计算的呢?比如OS任务中得到的只是一个准确率,但是在论文中Table3每个任务对应的都是分数,这中间的映射过程我在文中并没有找到,可以提示一下吗
#135 opened by lonerFarea - 5
- 1
请问如何使用本地的llama-2-hf模型进行测试呢,希望得到一些明确的指导![Bug/Assistance]
#133 opened by 5456es - 5
[Bug/Assistance]
#109 opened by ibingzhaoi - 2
增加对Cluade3的评测
#126 opened by webdxq - 1
请问支持使用openai的tool_call接口进行测试吗?
#132 opened by Maybewuss - 1
Excellent Job! Well, no offense, it seems LLM-Bench rather than AgentBench in essence.
#130 opened by Konisberg - 1
[Bug/Assistance] mind2web的unknown是怎么回事?
#129 opened by Tangent-90C - 4
- 1
OS std 测试集结果
#128 opened by webdxq - 3
Connection error
#124 opened by StupiddCupid - 3
Card_Game这个任务跑不起来
#121 opened by yupeijei1997 - 17
我该怎么解决这个问题,跑mind2web,不太清楚该如何操作这个任务,能给出一些具体的指导吗,谢谢
#119 opened by Ethan-2004 - 1
Benchmark for mistral models
#122 opened by mingxuan-he - 4
- 1
- 1
[Assistance] Connection Error
#86 opened by wz1211 - 1
- 1
[Bug/Assistance] "result": {"answer": "1049 (42000): Unknown database 'Football Matches'", "type": "UPDATE", "error"
#111 opened by 13416157913 - 2
[Bug/Assistance] OS任务报错AttributeError: 'NpipeSocket' object has no attribute '_sock'
#112 opened by 13416157913 - 6
- 1
[Bug/Assistance] 为什么dbbench任务,在mysql数据库中指创建一个unkown数据库名,而且里面只有一张表名称也是unkown,是不是初始化有问题?
#114 opened by 13416157913 - 2
我想看一下agent和server的交互函数,可以指导一下嘛
#92 opened by hushuang909 - 1
cg和kg都遇到了Worker not responding
#97 opened by WarBean - 1
[Bug/Assistance] os-std某一条数据报错Worker not responding
#105 opened by Xccanxin - 1
- 2
[Assistance] Need some example running logs
#103 opened by ROCKYWWWW - 1
About Webshop
#91 opened by dapengchen1234 - 3
游戏任务启动失败[Assistance]
#96 opened by smartliuhw - 1
[Bug/Assistance] DBBench Unknown database
#106 opened by LittleWhite0208 - 1
agentbench 能跑训练集么?
#107 opened by Fu-Dayuan - 1
- 0
生成package镜像选择时区之后卡住了,请问这个是怎么回事,重新生成也不好使
#104 opened by lidian1234 - 1
- 0
请问一下为什么output文件夹里没有overall.json?
#101 opened by tml2002 - 0
请问一下为什么output文件夹里没有overall.json?
#100 opened by tml2002 - 0
[Bug/Assistance]
#99 opened by tml2002 - 0
[Bug/Assistance]
#98 opened by tml2002 - 2
可否不用docker配置环境
#93 opened by smartliuhw - 1
cg任务没有一条执行成功而且task server没有收到任何信息
#87 opened by Jianzhao-Huang - 0
KBQA 任务数据集信息确认
#88 opened by WuXuan374 - 1
- 1
How to test in self customed data?
#83 opened by Reason-Wang - 3
- 0
[Bug/Assistance] The option link fails to jump
#85 opened by zhimin-z - 2
Separate server for task and model
#81 opened by Reason-Wang - 1
[Assistance] 如何获得每个task的得分?
#80 opened by Jiaqi0109 - 1
How to calculate the overall score?
#79 opened by zhimin-z