MARIO EVAL: A mathematical dataset evaluation toolkit

This is the official repository for the paper MARIO Eval. We fix some bugs in the original latex2sympy, and add more antlr parser syntax to support more latex expressions.

Evaluation on MATH dataset

Model Accuracy Reported
MathCoder-CL-7B 0.3064 0.3074
MathCoder-CL-34B 0.4584 0.461
ToRA-Code-34B 0.5136 0.51
ToRA-70B 0.5014 0.497
DeepSeek-Math-Base-7B 0.3318 0.3142
DeepSeek-Math-Instruct-7B 0.572 0.575
DeepSeek-Math-RL-7B 0.596 0.5878

Features

  • sympy based equivalence of two math expressions, see is_equiv
  • annotation of MATH testset with more robust evaluation, see data/math_testset_annotation.json and demo.py
  • integration of LLM

Requirements

  1. sympy=1.12
  2. antlr4-python3-runtime==4.11.1
  3. NOT install gmpy2

Use without install

> git clone https://github.com/MARIO-Math-Reasoning/MARIO_EVAL.git
> cd MARIO_EVAL
> python
>>> from latex2sympy.latex2sympy2 import latex2sympy
>>> latex2sympy("\\frac12")
1/2
>>> from math_evaluation import is_equiv 
>>> is_equiv("1\\frac12", "1.5")
True
>>> is_equiv("\\begin{pmatrix} 1 & \\frac12 \\\\ 1/3 & \\sqrt4 \\end{pmatrix}", 
...          "[[1.0, 1/2],[0.3333, 2.0]]")
True

Install as Python package

> git clone https://github.com/MARIO-Math-Reasoning/MARIO_EVAL.git
> cd MARIO_EVAL
> cd latex2sympy && pip install . && cd ..
> pip install -e .

Unittest

python -m unittest math_evaluation/tests/test_is_equiv.py

Citation

Please cite our paper if you use data or code.

@misc{zhang2024mario,
      title={MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit}, 
      author={Boning Zhang and Chengxi Li and Kai Fan},
      year={2024},
      eprint={2404.13925},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}