/math-evaluation-harness

A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨

Primary LanguagePythonMIT LicenseMIT

This repository is not active