mazzzystar/TurtleBench
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles.
Jupyter NotebookApache-2.0
Issues
- 3
用户问题已经揭晓了汤底这种情况的prompt
#8 opened - 2
Cot prompts
#4 opened - 1
another implementation method
#2 opened - 4
chinese复现失败,差5个点以上
#1 opened