chen9z/TurtleBenchmark
A novel LLM benchmark focus[es] on evaluation of model reasoning & understanding.
Python
Watchers
No one’s watching this repository yet.
A novel LLM benchmark focus[es] on evaluation of model reasoning & understanding.
Python
No one’s watching this repository yet.