Click here to see icon descriptions.

🚀 - state-of-the-art generic agent/technique at the moment of paper publication.
⭐ - valuable paper in some specific domain.
- Model-based RL (Model-based).
- Multi-Agent RL (MARL).
- Self-Play.
- Evolutionary & Genetic Algorithms (Evolution).
- Generalization across environments (Generalization).
- Neural Networks & Optimizers (NN).
- Manipulation tasks (Manipulator).
- Locomotion: MuJoCo, Roboschool, etc (Locomotion)
- Navigation tasks (Navigation).
- Strategy Planning Problems (Planning).
- Transfer learning (Transfer).
- Inverse Reinforcement Learning (IRL)
- Meta-Learning
- Sparse Reward Problems and/or Montezuma's Revenge (Sparse)
- Atari game (Atari).
- Table games (Table).
- Doom game (Doom).
- Starcraft game (Starcraft).
- Go game (Go).

Deep Reinforcement Learning

A constantly evolving list of Reinforcement Learning papers, notes, books, implementations etc.

RL Frameworks/Implementations

Baselines @ OpenAI

https://github.com/openai/baselines
Implemented in TensorFlow: PPO, A2C, DQN, TRPO, ACKTR, DDPG, HER, GAIL

Dopamine @ Google

https://github.com/google/dopamine
Implemented in TensorFlow: Rainbow, c51, IQN, DQN

TensorForce

https://github.com/reinforceio/tensorforce
Implemented in TensorFlow: A3C, PPO, TRPO, DQN

RL Agents benchmarks

Benchmarks for: PPO, A2C, ACKTR, ACER

https://github.com/openai/baselines-results/blob/master/acktr_ppo_acer_a2c_atari.ipynb

Benchmarks for: Rainbow, c51, IQN, DQN

https://google.github.io/dopamine/baselines/plots.html

Benchmarks for: Vanilla DQN, Double DQN, Dueling DQN, Prioritized DQN

https://github.com/openai/baselines-results/blob/master/dqn_results.ipynb

Papers

Year 2019

🚀 Go-Explore: a New Approach for Hard-Exploration Problems

[arXiv] Ecoffet et al., 2019; Uber AI Labs
Sparse

Year 2018

Exploration by Random Network Distillation (RND)

[arXiv] [blog] [code] Burda et al., 2018; OpenAI
Sparse

SFV: Reinforcement Learning of Physical Skills from Videos

[arXiv] [blog] Peng et al., 2018; Berkeley
Meta-Learning IRL

Large-Scale Study of Curiosity-Driven Learning

[blog] [pdf] Pathak et al., 2018; OpenAI, Berkeley, Univ. of Edinburgh
Navigation, Sparse

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

[arXiv] [blog] Jaderberg et al., 2018; Google DeepMind
Navigation

Evolving Multimodal Robot Behavior via Many Stepping Stones with the Combinatorial Multi-Objective Evolutionary Algorithm

[arXiv] Huizinga & Clune, 2018; University of Wyoming
Meta-Learning

Learning Dexterous In-Hand Manipulation

[arXiv] [blog] Andrychowicz et al.; OpenAI
Generalization, Manipulator

⭐ RUDDER: Return Decomposition for Delayed Rewards

[arXiv] [code] Arjona-Medina et al.; Johannes Kepler University Linz
Sparse, Atari

Relational Deep Reinforcement Learning

[arXiv] Zambaldi et al.; Google Deepmind
Planning, Starcraft

Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems

[arXiv] Stanton and Clune; University of Wyoming
Sparse

AutoAugment: Learning Augmentation Policies from Data

[arXiv] Cubuk et al.; Google Brain
NN

Playing Atari with Six Neurons

[arXiv] Cucci et al.; University of Fribourg, NYU
Atari

⭐ World Models

[arXiv] [blog] Ha and Schmidhuber; IDSIA, Google Brain, NNAISENSE
Model-based, Doom

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

[arXiv] Chrabaszcz et al.; University of Freiburg
Evolution, Atari

🚀 Implicit Quantile Networks for Distributional Reinforcement Learning (IQN)

[arXiv] Dabney et al., 2018; Google Deepmind
Atari

🚀 A Distributional Perspective on Reinforcement Learning (c51)

[arXiv] Bellemare et al., 2018; Google Deepmind
Atari

🚀 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

[arXiv] Such et al.; Uber AI Labs
Atari, Navigation

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

[arXiv] Finn et al.; UC Berkeley
IRL, Manipulator

⭐ Regularized Evolution for Image Classifier Architecture Search

[arXiv] Real et al.; Google Brain
Evo, NN

Building Generalizable Agents with a Realistic and Rich 3D Environment

[arXiv] Wu et al., 2018; Berkeley, FAIR
Generalization, Navigation

Year 2017

⭐ Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

[arXiv] Such et al.; Uber AI Labs
Locomotion, Atari

⭐ Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

[arxiv] Silver et al.; Google Deepmind
Self-Play, Planning, Table

🚀 Rainbow: Combining Improvements in Deep Reinforcement Learning (DQN improvements combined)

[arXiv] Hessel et al.; Google Deepmind
Atari

Meta Learning Shared Hierarchies

[arXiv] [blog] Frans et al.; OpenAI, Berkeley.
Locomotion, Meta-Learning

One-Shot Visual Imitation Learning via Meta-Learning

[arXiv] [pdf] Finn et al.; UC Berkeley, OpenAI
IRL, Meta-Learning, Manipulator

Learning with Opponent-Learning Awareness (LOLA)

[arXiv] [blog] Foerster et al.; OpenAI, Oxford, Berkeley, CMU
MARL

🚀 Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR, A2C)

[arXiv] Wu et al.; University of Toronto, New York University
Locomotion, Atari

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

[arXiv] [blog] [code] Nagabandi et al.; Berkeley
Locomotion

🚀 Proximal Policy Optimization Algorithms (PPO)

[arXiv] [blog] Schulman et al.; OpenAI
Locomotion, Atari

⭐ Learning Transferable Architectures for Scalable Image Recognition

[arXiv] Zoph et al.; Google Brain
NN

Hybrid Reward Architecture for Reinforcement Learning (HRA)

[arXiv] van Seijen et al.; Microsoft Maluuba, McGill University
Meta-Learning, Atari

Parameter Space Noise for Exploration

[arXiv] Plappert et al.; OpenAI, Karlsruhe Institute of Technology
Locomotion, Atari

⭐ Mastering the Game of Go without Human Knowledge (AlphaGo Zero)

[pdf], [blog] Silver et al.; Deepmind
Self-Play, Planning, Go, Table

Neural Optimizer Search with Reinforcement Learning

[pdf] Bello et al.; Google Brain
NN

Asymmetric Actor Critic for Image-Based Robot Learning

[arXiv], [official blog post] Pinto et al.; OpenAI, CMU
![gener] Generalization, Manipulator

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

[arXiv], [blog] Peng et al.; OpenAI, Berkeley
![gener] Generalization, Manipulator

A Deep Reinforcement Learning Chatbot

[arXiv] Serban et al.; MILA

Learning model-based planning from scratch

[arXiv], [blog] Pascanu et al.; Google DeepMind
Model-based, Locomotion

⭐ Imagination-Augmented Agents for Deep Reinforcement Learning (I2As)

[arXiv] [blog] Weber et al.; DeepMind
Planning, Transfer, Atari

Distral: Robust Multitask Reinforcement Learning

[arXiv] Teh et al.; Google Deepmind
Transfer, Navigation

Emergence of Locomotion Behaviours in Rich Environments

[arXiv] [blog] Heess et al.; DeepMind
Locomotion

Programmable Agents

[arXiv] Denil et al.; Google Deepmind
Locomotion

⭐ Evolution Strategies as a Scalable Alternative to Reinforcement Learning

[arXiv] Salimans et al.; OpenAI
Atari

Neural Episodic Control

[arXiv] Pritzel et al.; Google Deepmind
Atari

Year 2016

The Predictron: End-To-End Learning and Planning

[arXiv] Silver et al.; Google Deepmind
Model-based, Planning, Navigation

RL²: Fast Reinforcement Learning via Slow Reinforcement Learning

[arXiv] Duan et al.; Berkeley, OpenAI
Meta-Learning, Navigation

Neural Architecture Search with Reinforcement Learning

[arXiv] B. Zoph and Quoc V. Le; Google Brain; ICLR.
NN

⭐ Learning to Navigate in Complex Environments

[arXiv] Mirowski et al., 2016; Google Deepmind
Navigation

⭐ Reinforcement Learning with unsupervised auxiliary tasks (UNREAL)

[arXiv] Jaderberg et al.; Google DeepMind
📝 Notes
Locomotion, Atari, Navigation

🚀 Learning to act by predicting the future (VizDoom 2016 Full DM Winner)

[arXiv] Dosovitskiy, Koltun; Intel Labs
Navigation, Doom

Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games

[arXiv] Peng et al.; Alibaba Group, University College London
MARL, Starcraft

Playing FPS Games with Deep Reinforcement Learning (VizDoom 2016 Limited DM 2nd place)

[arXiv] Lample, Chaplot; Carnegie Mellon University
Navigation, Doom

[RTS:SC] Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

[arXiv] Usunier et al.; Facebook AI Research
Starcraft

🚀 Asynchronous Methods for Deep Reinforcement Learning (A3C)

[arXiv] Mnih et al.; Google Deepmind
📝 Notes
Locomotion, Atari, Navigation

Year 2015

🚀 Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN)

[arXiv] Wang et al.; Google Deepmind
Atari

🚀 Prioritized Experience Replay

[arXiv] Schaul et al.; Google Deepmind
📝 Notes
Atari

🚀 Deep Reinforcement Learning with Double Q-learning (Double DQN)

[arXiv] Hasselt et al.; Google Deepmind
Atari

High-dimensional continuous control using generalized advantage estimation

[arXiv] Schulman et al.; Berkeley
Locomotion

⭐ Trust Region Policy Optimization (TRPO)

[arXiv] Schulman et al.; UC Berkeley
Atari, Navigation, Locomotion

🚀 Human-level control through deep reinforcement learning (DQN)

[Nature] [pdf] Mnih et al.; Google Deepmind
📝 Notes
Atari

Mastering the game of Go with deep neural networks and tree search (AlphaGo Master)

[Nature], [reddit] Silver et al.; Deepmind, Google
Self-Play, Planning, Go, Table

Year 2013

🚀 Playing Atari with Deep Reinforcement Learning (DQN)

[arXiv] Mnih et al.; DeepMind Technologies
Atari

Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning

[pdf] Koutnik et al.; IDSIA, USI-SUPSI
Evolution

2012 and earlier

Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction

[pdf] Sutton et al. (2011); University of Alberta, McGill University
Manipulator, Locomotion

⭐ Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion

[pdf] Kohl and Stone (2004); The University of Texas at Austin
Manipulator, Locomotion

⭐ Autonomous helicopter flight via reinforcement learning

[pdf] Ng et al. (2004); Stanford, Berkeley
Manipulator

⭐ Actor-Critic Algorithms

[pdf] Konda and Tsitsiklis (2003)

⭐ Temporal Difference Learning and TD-Gammon

[pdf] Gerald Tesauro (1995)
Table

⭐ Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning (REINFORCE)

[pdf] Ronald J. Williams (1992); Northeastern University

Surveys

A Brief Survey of Deep Reinforcement Learning

[arXiv] Arulkumaran et al (2017).

Books

⭐ Reinforcement Learning: An Introduction (Complete Draft)

[pdf] Richard S. Sutton and Andrew G. Barto (2018)

Miscellaneous

How to Read a Paper

[pdf] S. Keshav (2007); University of Waterloo

ArXiv Sanity Preserver: A recommender system for searching papers that are published on arXiv.

http://www.arxiv-sanity.com/

GitXiv: A recommender system for searching papers and their supplementary materials (if available).

http://www.gitxiv.com/

sridas123/reinforcement-learning-notes