chen9z/TurtleBenchmark
A novel LLM benchmark focus[es] on evaluation of model reasoning & understanding.
Python
No issues in this repository yet.
A novel LLM benchmark focus[es] on evaluation of model reasoning & understanding.
Python
No issues in this repository yet.