mainlp/TruthQuest
We introduce TruthQuest, a benchmark designed to evaluate the suppositional reasoning capabilities of large language models through knights and knaves puzzles.
PythonCC-BY-SA-4.0
We introduce TruthQuest, a benchmark designed to evaluate the suppositional reasoning capabilities of large language models through knights and knaves puzzles.
PythonCC-BY-SA-4.0