night-chen/ToolQA
ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels (easy/hard) across eight real-life scenarios.
Jupyter NotebookApache-2.0
Stargazers
- allanjSalesforce Research
- allen3ai
- alreadydoneHeidelberg / Shenzhen
- chuanmingliuWesteros
- dapurv5@amazon-science
- fly51flyPRIS
- GanbenC Lab
- haotiansun14Georgia Institute of Technology
- hsaestFudan University
- JeffCarpenterCanada
- JieyuZ2University of Washington
- kuan-wangMicrosoft
- Kunlun-ZhuMila-Quebec AI Institute; UdeM
- liujiahengAlibaba Group
- luciusssssPeking University
- Magnetic2014Tianjin University, Tianjin, China
- Nardien
- odpPetuum, Inc.
- OpenAndrusAndrusB
- qianchen94
- raphaelcosta@anti-work
- RobertMartonMicrosoft
- ronch99The Ohio State University
- SandalotsVolcanak
- sasikiranValueLabs, LLP
- shenwzh3Sun Yat-sen University
- sherylke
- SidUMicrosoft
- tma15Kanagawa, Japan
- vishaal27University of Tübingen | University of Cambridge
- WadeYin9712UCLA
- wang-debug
- yaqingwangGoogle Deepmind
- YelZhang
- yueyu1030Georgia Institute of Technology
- yz-liu