mainlp/TruthQuest

We introduce TruthQuest, a benchmark designed to evaluate the suppositional reasoning capabilities of large language models through knights and knaves puzzles.

PythonCC-BY-SA-4.0

Stargazers

rabiasr