proofsearch

Experiments with Llemma and hypertree proof search (HTPS) for Lean4.

Relies on

Baseline results

Run bash scripts/eval_llemma7b.sh to run Llemma 7b (with awq quantization) on the minif2f validation data. Compute the accuracy with python compute_metrics.py. This should get around 26% accuracy.