/TurtleBenchmark

A novel LLM benchmark focus[es] on evaluation of model reasoning & understanding.

Primary LanguagePython

Watchers

No one’s watching this repository yet.