This is a collection of Multi-Agent Reinforcement Learning (MARL) papers. Each category is a potential start point for you to start your research. Some papers are listed more than once because they belong to multiple categories.
For MARL papers with code and MARL resources, please refer to MARL Papers with Code and MARL Resources Collection.
I will continually update this repository and I welcome suggestions. (missing important papers, missing categories, invalid links, etc.) This is only a first draft so far and I'll add more resources in the next few months.
This repository is not for commercial purposes.
My email: chenhao2019@ia.ac.cn
- Reviews
- Dealing With Credit Assignment Issue
- Policy Gradient
- Communication
- Emergent
- Opponent Modeling
- Game Theoretic
- Hierarchical
- Ad Hoc Teamwork
- League Training
- Curriculum Learning
- Mean Field
- Transfer Learning
- Meta Learning
- Fairness
- Exploration
- Graph Neural Network
- Model-based
- NAS
- Safe Multi-Agent Reinforcement Learning
- From Single-Agent to Multi-Agent
- Discrete-Continuous Hybrid Action Spaces / Parameterized Action Space
- Role
- Multi-Agent Path Finding
- TODO
- A Survey and Critique of Multiagent Deep Reinforcement Learning
- An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective
- Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms
- A Review of Cooperative Multi-Agent Deep Reinforcement Learning
- Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning
- A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity
- Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications
- A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems
- If multi-agent learning is the answer, what is the question?
- Multiagent learning is not the answer. It is the question
- Is multiagent deep reinforcement learning the answer or the question? A brief survey Note that A Survey and Critique of Multiagent Deep Reinforcement Learning is an updated version of this paper with the same authors.
- Evolutionary Dynamics of Multi-Agent Learning: A Survey
- (Worth reading although they're not recent reviews.)
- VDN:Value-Decomposition Networks For Cooperative Multi-Agent Learning
- QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
- QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
- NDQ: Learning Nearly Decomposable Value Functions Via Communication Minimization
- CollaQ:Multi-Agent Collaboration via Reward Attribution Decomposition
- SQDDPG:Shapley Q-Value: A Local Reward Approach to Solve Global Reward Games
- QPLEX: Duplex Dueling Multi-Agent Q-Learning
- QPD:Q-value Path Decomposition for Deep Multiagent Reinforcement Learning
- Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
- QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning
- COMA:Counterfactual Multi-Agent Policy Gradients
- LiCA:Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning
- MADDPG:Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
- COMA:Counterfactual Multi-Agent Policy Gradients
- IPPO:Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
- MAPPO:The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games
- MAAC:Actor-Attention-Critic for Multi-Agent Reinforcement Learning
- DOP: Off-Policy Multi-Agent Decomposed PolicyGradients
- Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient
- CommNet:Learning Multiagent Communication with Backpropagation
- BiCNet:Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games
- VAIN: Attentional Multi-agent Predictive Modeling
- IC3Net:Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks
- VBC:Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control
- Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation
- NDQ:Learning Nearly Decomposable Value Functions Via Communication Minimization
- RIAL/RIDL:Learning to Communicate with Deep Multi-Agent Reinforcement Learning
- ATOC:Learning Attentional Communication for Multi-Agent Cooperation
- Fully decentralized multi-agent reinforcement learning with networked agents
- TarMAC: Targeted Multi-Agent Communication
- SchedNet:Learning to Schedule Communication in Multi-Agent Reinforcement learning
- Learning Multi-agent Communication under Limited-bandwidth Restriction for Internet Packet Routing
- Gated-ACML:Learning Agent Communication under Limited Bandwidth by Message Pruning
- Learning Efficient Multi-agent Communication: An Information Bottleneck Approach
- Coordinating Multi-Agent Reinforcement Learning with Limited Communication
- Multiagent Cooperation and Competition with Deep Reinforcement Learning
- Multi-agent Reinforcement Learning in Sequential Social Dilemmas
- Emergent preeminence of selfishness: an anomalous Parrondo perspective
- Emergent Coordination Through Competition
- Biases for Emergent Communication in Multi-agent Reinforcement Learning
- Towards Graph Representation Learning in Emergent Communication
- Emergent Tool Use From Multi-Agent Autocurricula
- On Emergent Communication in Competitive Multi-Agent Teams
- QED:Quasi-Equivalence Discovery for Zero-Shot Emergent Communication
- Incorporating Pragmatic Reasoning Communication into Emergent Language
- Bayesian Opponent Exploitation in Imperfect-Information Games
- LOLA:Learning with Opponent-Learning Awareness
- Variational Autoencoders for Opponent Modeling in Multi-Agent Systems
- Stable Opponent Shaping in Differentiable Games
- Opponent Modeling and Strategic Reasoning in the Real-time Strategy Game Starcraft
- Opponent Modeling in Deep Reinforcement Learning
- Game Theory-Based Opponent Modeling in Large Imperfect-Information Games
- α-Rank: Multi-Agent Evaluation by Evolution
- α^α -Rank: Practically Scaling α-Rank through Stochastic Optimisation
- A Game Theoretic Framework for Model Based Reinforcement Learning
- Fictitious Self-Play in Extensive-Form Games
- An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning
- Combining Deep Reinforcement Learning and Search for Imperfect-Information Games
- Real World Games Look Like Spinning Tops
- PSRO: A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
- Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games
- A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems
- Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients
- Hierarchical multi-agent reinforcement learning
- Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery
- Hierarchical Critics Assignment for Multi-agent Reinforcement Learning
- Hierarchical Reinforcement Learning for Multi-agent MOBA Game
- Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction
- HAMA:Multi-Agent Actor-Critic with Hierarchical Graph Attention Network
- CollaQ:Multi-Agent Collaboration via Reward Attribution Decomposition
- A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems
- Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork
- Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems
- From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning
- EPC:Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning
- Emergent Tool Use From Multi-Agent Autocurricula
- Learning to Teach in Cooperative Multiagent Reinforcement Learning
- StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning
- Cooperative Multi-agent Control using deep reinforcement learning
- Mean Field Multi-Agent Reinforcement Learning
- Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning
- Bayesian Multi-type Mean Field Multi-agent Imitation Learning
- A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems
- Parallel Knowledge Transfer in Multi-Agent Reinforcement Learning
- A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
- Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments
- FEN:Learning Fairness in Multi-Agent Systems
- Fairness in Multiagent Resource Allocation with Dynamic and Partial Observations
- Fairness in Multi-agent Reinforcement Learning for Stock Trading
- EITI/EDTI:Influence-Based Multi-Agent Exploration
- MAVEN:Multi-Agent Variational Exploration
- CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning
- Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning
- Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework
- Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory
- LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning
- Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
- Multi-Agent Game Abstraction via Graph Attention Neural Network
- Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation
- Multi-Agent Reinforcement Learning with Graph Clustering
- Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems
- MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding
- Safer Deep RL with Shallow MCTS: A Case Study in Pommerman
- IQL:Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents
- IPPO:Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
- MAPPO:The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games
- MADDPG:Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
- Deep Reinforcement Learning in Parameterized Action Space
- DMAPQN: Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces
- H-PPO: Hybrid actor-critic reinforcement learning in parameterized action space
- P-DQN: Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space
- ROMA: Multi-Agent Reinforcement Learning with Emergent Roles
- RODE: Learning Roles to Decompose Multi-Agent Tasks
- TODO
- Multi-Agent Path Finding