/Multi-agent-peer-review

Official implementation of our paper "Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration".

Primary LanguagePython

Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

Overview | News | Example | Usage

✨ Overview

This repository contains official implementation of our paper Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration.

We introduce a multi-agent collaboration strategy that emulates the academic peer review process. Each agent independently constructs its own solution, provides reviews on the solutions of others, and assigns confidence levels to its reviews. Upon receiving peer reviews, agents revise their initial solutions.

Extensive experiments on three different types of reasoning tasks show that our collaboration approach delivers superior accuracy across all ten datasets compared to existing methods.

If you have any question, please feel free to contact us by e-mail: xuzhenran.hitsz@gmail.com or submit your issue in the repository.

🔥 News

[Nov 14, 2023] We release the codes and the results of our method.

🚀 Example

Multi-agent Peer Review

🚨 Usage

Environment

conda create -n MAPR python=3.9
conda activate MAPR
pip install -r requirements.txt

Run

Take GSM8K dataset as an example.

1. Peer Review

python peer_review.py --task GSM8K --openai_key YOUR_KEY --openai_organization YOUR_ORG

2. Debate

python debate.py --task GSM8K --openai_key YOUR_KEY --openai_organization YOUR_ORG

3. Peer Review w/o Confidence

python feedback.py --task GSM8K --openai_key YOUR_KEY --openai_organization YOUR_ORG

4. Self-correction

python self_correction.py --task GSM8K --openai_key YOUR_KEY --openai_organization YOUR_ORG

5. Majority and Zero-shot CoT

python single_agent.py --task GSM8K --openai_key YOUR_KEY --openai_organization YOUR_ORG

Evaluate

Take GSM8K dataset as an example.

python eval.py --task GSM8K --method peer_review --time_flag 1113