This tools intention is to help AI developers to parse the mathlib4 theorems and enable them to test their AIs with an interactive lean testing enviroment.
Run
git submodule update --init --recursive
python -m venv venv
source venv/bin/activate
to run all scripts and tests without explict imports, export the path src like this:
export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"
Install lean and lake as suggested here
make test-unit
make test-e2e
Modify the function generate_training_data
in src/generate/training_data according to your needs.
Test your modified funciton by running its e2e test:
python -m pytest tests/e2e/test_create_training_data.py
and check that the output format is as you want it to be.
To start the overall export on all mathlab files, clone mathlib4 github repo:
git clone git@github.com:leanprover-community/mathlib4.git
Then you can run the following script to extract the data for all files:
python scripts/generate_training_data/raw_training_data.py <Path to the cloned mathlib4 repo, e.g. ~/coding/ai/lean/mathlib4/Mathlib >
It will generate an output.json file in the project directory for you with your data.
Much of the code is copied and inspired by Lean Dojo. It was mostly reorganized to serve a bigger audience and be faster to execute.