/Multi-Agent-Papers

The awesome agents in the era of large language models

Papers for LLM-based Agents Collaboration

In the era of large language models (LLMs), LLM-based agents have shown remarkable performance in several existing benchmarks or real-world applications. Nevertheless, they still face difficulties in tackling complex tasks. Inspired by collaborative problem solving, several recent works use the strategy of multi-agent collaboration as a potential solution.

We collect the Must-read papers to catch up and share the state-of-the-art methods, facilitating the related research.

LLM-based Agent

  1. ReAct: Synergizing Reasoning and Acting in Language Models [paper] [code]
  • Dataset: HotpotQA, FEVER, ALFWorld, WebShop

Link: more previous works can be found in:

Thanks a lot for pioneering effort.

Multi-Agent Collaboration

  1. [2023/10] Metaagents: Simulating Interactions Of HuMan Behaviors For Llm-Based Task-Oriented Coordination Via Collaborative Generative Agents (Lehigh University)[paper]
  • task: Task-oriented Social
  1. [2023/10] GameGPT: Multi-agent Collaborative Framework For Game Development (AutoGame Research)[paper]
  • task: Coding, Game Development, Multi-Agent cooperation
  1. [2023/10] Evaluating Multi-agent Coordination Abilities In Large Language Models (University of California, Santa Cruz) [paper]
  • task: Multi-agent coordination, LLM-ToM-Reasoning
  1. [2023/10] Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation using Large Language Models [paper] [code]
  • task: Visual Semantic Navigation
  • Dataset: HM3D
  1. [2023/10] Dynamic Llm-Agent Network:An Llm-Agent Collaboration Framework With Agent Team Optimization[paper]
  • task:arithmetic reasoning, general reasoning, code generation.
  • Dataset:MATH, MMLU, HumanEval
  1. [2023/10] Multi-agent Consensus Seeking Via Large Language Models (Westlake University)[paper]
  • task: Reasoning
  1. [2023/10] Exploring Collaboration Mechanisms For Llm Agents: A Social Psychology View (National University of Singapore, NUS-NCS Joint Lab) [paper]
  • task: Multi-agent cooperation
  • Dataset: MMLU, MATH, BIG-Bench Benchmark
  1. [2023/10] Corex: Pushing The Boundaries Of Complex Reasoning Through Multi-Model Collaboration[paper][code]
  • task:Reasoning
  • Dataset:GSM8K, MultiArith, SingleOP/SingleEQ, AddSub, AQuA, SVAMP,GSMHard,StrategyQA, CommonsenseQA, BoolQ ,AI2 Reasoning Challenge (ARC-c),BigBench,FinQA, ConvFinQA, TAT-QA
  1. [2023/10] Language Agents With Reinforcement Learning For Strategic Play In The Werewolf Game[paper]
  • task:Werewolf game
  1. [2023/10] AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems (Gaoling School of Artificial Intelligence, Renmin University of China)[paper]
  • task: Recommendation
  • Dataset: CDs and Vinyl, Office Products
  1. [2023/10] Agentverse: Facilitating Multi-Agent Collaboration And Exploring Emergent Behaviors[paper][code]
  • task:Conversation, Mathematical Calculation, Logical Reasoning, Coding
  • Dataset:FED, Commongen-Challenge, MGSM, BigBench, Humaneval
  1. [2023/10] Large Language Models Can Design Gametheoretic Objectives For Multi-Agent Planning[paper]
  • task: Embodied Intelligence
  • Dataset:ThreeDWorld Transport Challenge
  1. [2023/10] Communicative Agents For Software Development (Tsinghua University) [paper]
  • task: Coding
  • Dataset: Camel
  1. [2023/09] Chain-Of-Experts: When Llms Meet Complex Operations Research Problems[paper][code]
  • task: Math(LP)
  • Dataset: LPWP, ComplexOR
  1. [2023/09] OKR-Agent: An Object And Key Results Driven Agent System With Hierarchical Self-Collaboration And Self-Evaluation[paper]
  • task: Storyboard Generation, Creative Writing, Trip Planning
  • Dataset: (case study)
  1. [2023/09] Reason To Behave: Achieving Human-Level Task Execution For Physics-Based Characters[paper][code]
  • task: Path Planning
  • Dataset: MoCap
  1. [2023/09] AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems (Gaoling School of Artificial Intelligence, Renmin University of China)[paper]
  • task: Recommendation
  • Dataset: CDs and Vinyl, Office Products
  1. [2023/09] Adapting Llm Agents Through Communication [paper]
  • task:Path Planning, QA,Math reasoning
  • Dataset:ALFWorld, HotpotQA, GSM8k
  1. [2023/09] Autoagents: A Framework For Automaticagent Generation [paper][code]
  • task:Open-ended Question Answer task,Trivia Creative Writing
  • Dataset: MT-bench
  1. [2023/09] Metagpt: Meta Programming For A Multi-Agent Collaborative Framework[paper]
  • task:Coding
  • Dataset:HumanEval, MBPP, SoftwareDev
  1. [2023/09] Oceangpt: A Large Language Model For Ocean Science Tasks[paper]
  • task:Ocean-related Task
  • Dataset: open-access literature,OCEANBENCH
  1. [2023/09] Playing Repeated Games With Large Language Models[[paper](https://openreview.

  2. [2023/09] Playing Repeated Games With Large Language Models[paper]

  • task:cooperation and coordination games.
  1. [2023/09] Chateval: Towards Better Llm-Based Evaluators Through Multi-Agent Debate[paper]
  • task:QA
  • Dataset:FairEval, Topical-Chat
  1. [2023/09] Language Agents With Reinforcement Learning For Strategic Play In The Werewolf Game[paper]
  • task:Werewolf game
  1. [2023/09] Mindagent: Emergent Gaming Interaction[paper]
  • task:Planning,Coordination
  • Dataset: Cuisine World
  1. [2023/09] Building Cooperative Embodied Agents Modularly With Large Language Model[paper][code]
  • task: Planning, Conversation, Cooperation
  • Dataset:ThreeDWorld Multi-Agent Transport (TDW-MAT)
  1. [2023/09] Autoagent: Enabling Next-Gen Llm Applications Via Multi-Agent Conversation (Microsoft Research) [paper][code]
  • task:Math, QA, Decision, Coding, Chat, Chess
  • Dataset: MATH, Natural Questions, ALFWorld
  1. [2023/09] Evaluating Multi-agent Coordination Abilities In Large Language Models (University of California, Santa Cruz) [paper]
  • task: Multi-agent coordination, LLM-ToM-Reasoning
  1. [2023/08] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework. (Microsoft Research) [paper] [code]
  • task: Multi-agent Cooperation, Conversation, MMLU
  • Dataset: MATH, Natural Questions, ALFWorld, OptiGuide
  1. [2023/08] Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration. (University of Illinois Urbana-Champaign) [paper] [code]
  • task: Cognitive synergy
  • Dataset: BigBench, TriviaQA
  1. [2023/08] CGMI: Configurable General Multi-Agent Interaction Framework. (East China Normal University) [paper]
  • task: Replicate human interactions in real-world scenarios
  1. [2023/08] ProAgent: Building Proactive Cooperative AI with Large Language Models. (Institute for Artificial Intelligence, Peking University) [paper] [code]
  • task:Cooperative Reasoning, Planning
  • Dataset: Overcooked-AI
  1. [2023/07] RoCo: Dialectic Multi-Robot Collaboration with Large Language Models. (Columbia University) [paper] [code]
  • task: Communication, Path Planning, Reasoning
  • Dataset: RoCoBench
  1. [2023/07] Communicative Agents For Software Development (Tsinghua University) [paper]
  • task: Coding
  • Dataset: Camel
  1. [2023/06] When Large Language Model Based Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm (Gaoling School of Artificial Intelligence Renmin University of China, Beijing, China)[paper]
  • task: User Simulation
  • Dataset: RecAgent
  1. [2023/06] Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents. (University of Alberta) [paper]
  • task: Multi-Agent coordination
  1. [2023/05] Training Socially Aligned Language Models in Simulated Human Society. (Dartmouth College) [paper] [code]
  • task: Learn From Simulated Social Interactions
  • Dataset: Anthropic RLHF
  1. [2023/05] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks. (Allen Institute for Artificial Intelligence) [paper] [code]
  • task: Reasoning, Path Planning
  • Dataset: ScienceWorld
  1. [2023/05] ChatGPT as your Personal Data Scientist. (Auburn University) [paper]
  • task: AutoML
  • Dataset: UCI Machine Learning Repository, Cora
  1. [2023/05] Agents: An Open-source Framework for Autonomous Language Agents. (ETH Zürich) [paper] [code]
  • task:Planning, Tool Usage, Multi-Agents communication
  1. [2023/05] Improving Factuality and Reasoning in Language Models through Multiagent Debate. (Google Brain) [paper] [code]
  • task: Mathematical Reasoning, Strategic Reasoning
  • Dataset: GSM8K, MMLU
  1. [2023/05] Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback. (University of Edinburgh) [paper] [code]
  • task: Autonomously Improve
  1. [2023/05] Examining the Inter-Consistency of Large Language Models: An In-depth Analysis via Debate. (Research Center for Social Computing and Information Retrieval Harbin Institute of Technology, China) [paper]
  • task: Multi-Agents Coordination
  • Dataset: αNLI, CSQA, COPA, e-CARE,Social IQa, PIQA, StrategyQA
  1. [2023/05] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks. (Allen Institute for Artificial Intelligence) [paper] [code]
  • task: Reasoning, Path Planning
  • Dataset: ScienceWorld
  1. [2023/05] ChatGPT as your Personal Data Scientist. (Auburn University) [paper]
  • task: AutoML
  • Dataset: UCI Machine Learning Repository, Cora
  1. [2023/05] Agents: An Open-source Framework for Autonomous Language Agents. (ETH Zürich) [paper] [code]
  • task:Planning, Tool Usage, Multi-Agents communication
  1. [2023/05] Improving Factuality and Reasoning in Language Models through Multiagent Debate. (Google Brain) [paper] [code]
  • task: Mathematical Reasoning, Strategic Reasoning
  • Dataset: GSM8K, MMLU
  1. [2023/05] Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback. (University of Edinburgh) [paper] [code]
  • task: Autonomously Improve
  1. [2023/05] Training Socially Aligned Language Models in Simulated Human Society. (Dartmouth College) [paper] [code]
  • task: Learn From Simulated Social Interactions
  • Dataset: Anthropic RLHF
  1. [2023/01] Blind Judgement: Agent-Based Supreme Court Modelling With GPT. (McGill University) [paper]
  • task: Reasoning, Prediction
  • Dataset: SCDB

Datasets

We gather information on commonly used datasets for reference. Please be aware that there may be slight difference in the dataset due to different versions.

Name (link) Task Number Evaluation* Paper
Hotpot-QA open-domain QA train/dev/test: 88k/5.6k/5.6k Exactly Match (EM) paper
mmlu multiple-choice questions train/dev/test: 99.8k/285/1.531k Multitask Accuracy paper
math reasoning 1.25k Exactly Match(EM) paper
ALFWorld Embodied AI 3.5k// Generalization paper
Natural Questions QA 30.7k//0.78k Exactly Match(EM) paper
GSM8K reasoning 7.5k//1.062k Exactly Match(EM) paper
HumanEval coding 164 handwritten programming questions Correctness paper
BigBench coding 214 tasks Correctness, Fluency paper
AI2 Reasoning Challenge choice question 3.37k/0.87k/3.55k Correctness paper
MGSM Math 8/0.25k Exactly Match(EM) paper
FairEval llm evaluation 80 Accuracy(Fairness) paper
MBPP coding 0.37k/0.09k/0.5k Accuracy paper
Topical-Chat chat 11k Coherence, Knowledge grounding, Contextual relevance paper
WinoGrande choice 9.25k/1.25k/1.77k Accuracy paper
CommonsenseQA commonsense knowledge QA 12k Accuracy paper
FinQA Numerical Reasoning over Financial Data 8.28k Accuracy paper
boolq yes/no questions 9.23k//3.27k Accuracy paper
GSMHard math 1.32k// Correctness
SVAMP math 1k Accuracy with emantic variations paper
ConvFinQA Numerical Reasoning in Conversational Finance 3k/0.4k/0.4k Correctness in neural symbolic methods and prompting-based methods paper
TAT-QA Finance QA 16k Correctness paper
MultiArith math 420//180 Accuracy, Precision, Recall, and F1-score
common_gen constrained text generation task 67.4k/4.02k/1.5k Coherent paper
Toolbench Tool Usage 16k API function call success rate paper
RestBench Resolve instructions 157 Understand and execute complex instructions paper
ToolQA Use external tools for question answering 1.5k Success rate in answering questions paper

Simulation with Multi-agent

  1. Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation
  2. Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
  3. Welfare Diplomacy: Benchmarking Language Model Cooperation
  4. Rethinking the Buyer’s Inspection Paradox in Information Markets with Language Agents
  5. Lyfe Agents: generative agents for low-cost real-time social interactions
  6. SocioDojo: Building Lifelong Analytical Agents with Real-world Text and Time Series
  7. SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

Evaluation

  1. Theory Of Mind For Multi-agent Collaboration Via Large Language Models [paper]
  2. Evaluating Large Language Models at Evaluating Instruction Following [paper]
  3. AgentBench: Evaluating LLMs as Agents
  4. Identifying the Risks of LM Agents with an LM-Emulated Sandbox*
  5. Evaluating Multi-Agent Coordination Abilities in Large Language Models
  6. SmartPlay : A Benchmark for LLMs as Intelligent Agents

Acknoledgement

Acknowledging all the paper authors for their excellent works. We also extend our thanks to all contributors.

For Contribution: There are cases where we miss important works in this field, please contribute to this repo! Thanks for the efforts in advance.

Contact

For any question, feel free to contact us. We also welcome any form of collaboration.

Email: shizhl@mail.sdu.edu.cn