/llm-verified-with-monte-carlo-tree-search

LLM verified with Monte Carlo Tree Search

Primary LanguagePythonMIT LicenseMIT

LLM verified with Monte Carlo Tree Search

This prototype synthesizes verified code with an LLM.

Using Monte Carlo Tree Search (MCTS), it explores the space of possible generation of a verified program, and it checks at every step that it's on the right track by calling the verifier.

This prototype uses Dafny, Coq, or Lean. To select the language, uncomment the corresponding line in lang.py.

Logs for example runs can be found in the log directory. Scroll to the very end of a log to see a chosen solution. Note that the linked solution is optimal for the problem.

By using this technique, weaker models that might not even know the generated language all that well can compete with stronger models.

We can also reinforce the snippets that succeed positively and that fail negatively through PPO training. The model after PPO can solve the prompts without backtracking! For example, the log for solving the problem fact after PPO training on another problem, opt0.

Running

This project relies on GPU access. It has been tested on a multi-GPU machine with two NVIDIA A100s.

If you don't have access to a GPU, you can still run the project using GPT-4, but you will need to use llm.gpt4 instead of llm.generate in run.py, run_user.py, and run_verifier_feedback.py.

Setup

Using mamba or equivalently conda:

mamba create --name llm-verified python=3.10
mamba activate llm-verified
pip install -r requirements.txt

(If you want to use Lean) Install Lean: See detailed instructions here. Then, run lake build in the repl directory.

(If you want to use GPT-4) Set "OPENAI_API_KEY" in your environment variables to your OpenAI API key.

Execution

Pick a language in lang.py, an LLM in model.py, and a prompt in prompts.py, then:

python run.py

For the run that interacts with the user, do:

python run_user.py

For the run that feeds back verifier info, do:

python run_verifier_feedback.py

For the PPO trainer (slow!), do:

python run_ppo.py

Credits