night-chen/ToolQA

ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels (easy/hard) across eight real-life scenarios.

Jupyter NotebookApache-2.0

Issues

Coffee dataset
#7 opened 8 months ago by xschen-beb
0
The download link for the Coffee raw data is no longer functioning.
#6 opened a year ago by Chen-GX
0
About the evaluation
#5 opened a year ago by zhangzhen-research
0
About **Programmatic Answer Generation** Part
#4 opened 2 years ago by yc1999
1
Expecting the benchmark code!
#3 opened 2 years ago by hzy312
1
The release of tool code
#2 opened 2 years ago by hsaest
2
the question and answer are mismatch
#1 opened 2 years ago by Ericmututu
1