llm-debate

Runs multi-agent debate with open-source HuggingFace models on the Arithmetic problem.

gen_math.py runs debates with HuggingFace model (e.g. mistralai/Mistral-7B-Instruct-v0.2)

To reproduce the figure in original paper (scaling with rounds and agents):

set agents/rounds and run ./gen_math.sh training script
generate figures in outputs.ipynb

gen_math_panel.py and ./gen_math.sh are modified to run a panel experiment with multiple different HuggingFace models. Models are specified from a list of available options by passing in indices as command line arguments.

Scaling agents and rounds

Reproduced figures

Panel experiment

Testing with diverse panel of HuggingFace open source models

model	Arithmetic (%)	std
Single Agent (Mistral)	16	7.3
Single Agent Panel (Mistral)	28	8.9
Multi Agent Panel	24	8.5

Original paper Github

ellenjxu/llm-debate

llm-debate

Scaling agents and rounds

Panel experiment