关于dataset statistics & tool generation

Question

Closed this issue 3 months ago · 1 comments

想请教作者下面的问题，非常感谢您的回答：-)

论文中说" pick 553 instruction-solution pairs after two-round human verification", 为什么表中REVIEW为487而不是553，什么样的instruction-solution最后不需要评估呢？
表中，RETRIEVE 是6462（应该是检索了6462个工具？），UNDERSTAND 是6753（每个tool的参数填充都需要一次UNDERSTAND ?）
如果2是对的，为什么RETRIEVE 和UNDERSTAND 不相等呢？
论文中说，有15 tools。请问这些工具是怎么构造的呢？（或是从哪里采样的呢？）我看repo里面没有生成工具的代码，也没有专门存放工具的文件。
有子工具(如PPT.create_file为一个子工具)的介绍/统计信息吗？

The statistics of the evaluation datasets in T-Eval

Dataset Test Cases
INSTRUCT 2660
RETRIEVE 6426
PLAN 553
REASON 6426
REVIEW 487
UNDERSTAND 6753
Total 23305

Answer 1 · 2024-01-15T02:26:43.000Z