Click here to see icon descriptions.
- 🚀 - state-of-the-art generic agent/technique at the moment of paper publication.
- ⭐ - valuable paper in some specific domain.
- - Model-based RL (Model-based).
- - Multi-Agent RL (MARL).
- - Self-Play.
- - Evolutionary & Genetic Algorithms (Evolution).
- - Generalization across environments (Generalization).
- - Neural Networks & Optimizers (NN).
- - Manipulation tasks (Manipulator).
- - Locomotion: MuJoCo, Roboschool, etc (Locomotion)
- - Navigation tasks (Navigation).
- - Strategy Planning Problems (Planning).
- - Transfer learning (Transfer).
- - Inverse Reinforcement Learning (IRL)
- - Meta-Learning
- - Sparse Reward Problems and/or Montezuma's Revenge (Sparse)
- - Atari game (Atari).
- - Table games (Table).
- - Doom game (Doom).
- - Starcraft game (Starcraft).
- - Go game (Go).
Deep Reinforcement Learning
A constantly evolving list of Reinforcement Learning papers, notes, books, implementations etc.
RL Frameworks/Implementations
Baselines @ OpenAI
- https://github.com/openai/baselines
- Implemented in TensorFlow: PPO, A2C, DQN, TRPO, ACKTR, DDPG, HER, GAIL
Dopamine @ Google
- https://github.com/google/dopamine
- Implemented in TensorFlow: Rainbow, c51, IQN, DQN
TensorForce
- https://github.com/reinforceio/tensorforce
- Implemented in TensorFlow: A3C, PPO, TRPO, DQN
RL Agents benchmarks
Benchmarks for: PPO, A2C, ACKTR, ACER
Benchmarks for: Rainbow, c51, IQN, DQN
Benchmarks for: Vanilla DQN, Double DQN, Dueling DQN, Prioritized DQN
Papers
Year 2019
🚀 Go-Explore: a New Approach for Hard-Exploration Problems
- [arXiv] Ecoffet et al., 2019; Uber AI Labs
- Sparse
Year 2018
Exploration by Random Network Distillation (RND)
SFV: Reinforcement Learning of Physical Skills from Videos
Large-Scale Study of Curiosity-Driven Learning
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning
Evolving Multimodal Robot Behavior via Many Stepping Stones with the Combinatorial Multi-Objective Evolutionary Algorithm
- [arXiv] Huizinga & Clune, 2018; University of Wyoming
- Meta-Learning
Learning Dexterous In-Hand Manipulation
⭐ RUDDER: Return Decomposition for Delayed Rewards
Relational Deep Reinforcement Learning
- [arXiv] Zambaldi et al.; Google Deepmind
- Planning, Starcraft
Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems
- [arXiv] Stanton and Clune; University of Wyoming
- Sparse
AutoAugment: Learning Augmentation Policies from Data
- [arXiv] Cubuk et al.; Google Brain
- NN
Playing Atari with Six Neurons
- [arXiv] Cucci et al.; University of Fribourg, NYU
- Atari
⭐ World Models
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
- [arXiv] Chrabaszcz et al.; University of Freiburg
- Evolution, Atari
🚀 Implicit Quantile Networks for Distributional Reinforcement Learning (IQN)
- [arXiv] Dabney et al., 2018; Google Deepmind
- Atari
🚀 A Distributional Perspective on Reinforcement Learning (c51)
- [arXiv] Bellemare et al., 2018; Google Deepmind
- Atari
🚀 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
- [arXiv] Such et al.; Uber AI Labs
- Atari, Navigation
One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
- [arXiv] Finn et al.; UC Berkeley
- IRL, Manipulator
⭐ Regularized Evolution for Image Classifier Architecture Search
- [arXiv] Real et al.; Google Brain
- Evo, NN
Building Generalizable Agents with a Realistic and Rich 3D Environment
- [arXiv] Wu et al., 2018; Berkeley, FAIR
- Generalization, Navigation
Year 2017
⭐ Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
- [arXiv] Such et al.; Uber AI Labs
- Locomotion, Atari
⭐ Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
- [arxiv] Silver et al.; Google Deepmind
- Self-Play, Planning, Table
🚀 Rainbow: Combining Improvements in Deep Reinforcement Learning (DQN improvements combined)
- [arXiv] Hessel et al.; Google Deepmind
- Atari
Meta Learning Shared Hierarchies
One-Shot Visual Imitation Learning via Meta-Learning
Learning with Opponent-Learning Awareness (LOLA)
🚀 Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR, A2C)
- [arXiv] Wu et al.; University of Toronto, New York University
- Locomotion, Atari
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
🚀 Proximal Policy Optimization Algorithms (PPO)
⭐ Learning Transferable Architectures for Scalable Image Recognition
- [arXiv] Zoph et al.; Google Brain
- NN
Hybrid Reward Architecture for Reinforcement Learning (HRA)
- [arXiv] van Seijen et al.; Microsoft Maluuba, McGill University
- Meta-Learning, Atari
Parameter Space Noise for Exploration
- [arXiv] Plappert et al.; OpenAI, Karlsruhe Institute of Technology
- Locomotion, Atari
⭐ Mastering the Game of Go without Human Knowledge (AlphaGo Zero)
Neural Optimizer Search with Reinforcement Learning
- [pdf] Bello et al.; Google Brain
- NN
Asymmetric Actor Critic for Image-Based Robot Learning
- [arXiv], [official blog post] Pinto et al.; OpenAI, CMU
- ![gener] Generalization, Manipulator
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
A Deep Reinforcement Learning Chatbot
- [arXiv] Serban et al.; MILA
Learning model-based planning from scratch
⭐ Imagination-Augmented Agents for Deep Reinforcement Learning (I2As)
Distral: Robust Multitask Reinforcement Learning
- [arXiv] Teh et al.; Google Deepmind
- Transfer, Navigation
Emergence of Locomotion Behaviours in Rich Environments
Programmable Agents
- [arXiv] Denil et al.; Google Deepmind
- Locomotion
⭐ Evolution Strategies as a Scalable Alternative to Reinforcement Learning
- [arXiv] Salimans et al.; OpenAI
- Atari
Neural Episodic Control
- [arXiv] Pritzel et al.; Google Deepmind
- Atari
Year 2016
The Predictron: End-To-End Learning and Planning
- [arXiv] Silver et al.; Google Deepmind
- Model-based, Planning, Navigation
RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
- [arXiv] Duan et al.; Berkeley, OpenAI
- Meta-Learning, Navigation
Neural Architecture Search with Reinforcement Learning
- [arXiv] B. Zoph and Quoc V. Le; Google Brain; ICLR.
- NN
⭐ Learning to Navigate in Complex Environments
- [arXiv] Mirowski et al., 2016; Google Deepmind
- Navigation
⭐ Reinforcement Learning with unsupervised auxiliary tasks (UNREAL)
🚀 Learning to act by predicting the future (VizDoom 2016 Full DM Winner)
- [arXiv] Dosovitskiy, Koltun; Intel Labs
- Navigation, Doom
Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games
- [arXiv] Peng et al.; Alibaba Group, University College London
- MARL, Starcraft
Playing FPS Games with Deep Reinforcement Learning (VizDoom 2016 Limited DM 2nd place)
- [arXiv] Lample, Chaplot; Carnegie Mellon University
- Navigation, Doom
[RTS:SC] Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks
- [arXiv] Usunier et al.; Facebook AI Research
- Starcraft
🚀 Asynchronous Methods for Deep Reinforcement Learning (A3C)
Year 2015
🚀 Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN)
- [arXiv] Wang et al.; Google Deepmind
- Atari
🚀 Prioritized Experience Replay
🚀 Deep Reinforcement Learning with Double Q-learning (Double DQN)
- [arXiv] Hasselt et al.; Google Deepmind
- Atari
High-dimensional continuous control using generalized advantage estimation
- [arXiv] Schulman et al.; Berkeley
- Locomotion
⭐ Trust Region Policy Optimization (TRPO)
- [arXiv] Schulman et al.; UC Berkeley
- Atari, Navigation, Locomotion
🚀 Human-level control through deep reinforcement learning (DQN)
Mastering the game of Go with deep neural networks and tree search (AlphaGo Master)
Year 2013
🚀 Playing Atari with Deep Reinforcement Learning (DQN)
- [arXiv] Mnih et al.; DeepMind Technologies
- Atari
Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning
- [pdf] Koutnik et al.; IDSIA, USI-SUPSI
- Evolution
2012 and earlier
Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction
- [pdf] Sutton et al. (2011); University of Alberta, McGill University
- Manipulator, Locomotion
⭐ Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion
- [pdf] Kohl and Stone (2004); The University of Texas at Austin
- Manipulator, Locomotion
⭐ Autonomous helicopter flight via reinforcement learning
- [pdf] Ng et al. (2004); Stanford, Berkeley
- Manipulator
⭐ Actor-Critic Algorithms
- [pdf] Konda and Tsitsiklis (2003)
⭐ Temporal Difference Learning and TD-Gammon
- [pdf] Gerald Tesauro (1995)
- Table
⭐ Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning (REINFORCE)
- [pdf] Ronald J. Williams (1992); Northeastern University
Surveys
A Brief Survey of Deep Reinforcement Learning
- [arXiv] Arulkumaran et al (2017).
Books
⭐ Reinforcement Learning: An Introduction (Complete Draft)
- [pdf] Richard S. Sutton and Andrew G. Barto (2018)
Miscellaneous
How to Read a Paper
- [pdf] S. Keshav (2007); University of Waterloo
ArXiv Sanity Preserver: A recommender system for searching papers that are published on arXiv.
GitXiv: A recommender system for searching papers and their supplementary materials (if available).