Runs multi-agent debate with open-source HuggingFace models on the Arithmetic problem.
gen_math.py
runs debates with HuggingFace model (e.g. mistralai/Mistral-7B-Instruct-v0.2)
To reproduce the figure in original paper (scaling with rounds and agents):
- set agents/rounds and run
./gen_math.sh
training script - generate figures in
outputs.ipynb
gen_math_panel.py
and ./gen_math.sh
are modified to run a panel experiment with multiple different HuggingFace models. Models are specified from a list of available options by passing in indices as command line arguments.
Reproduced figures
Testing with diverse panel of HuggingFace open source models
model | Arithmetic (%) | std |
---|---|---|
Single Agent (Mistral) | 16 | 7.3 |
Single Agent Panel (Mistral) | 28 | 8.9 |
Multi Agent Panel | 24 | 8.5 |