A constantly evolving list of Reinforcement Learning papers, notes, books etc.

Glossary:

🚀 - state-of-the-art method in current domain at the moment of paper publication.
⭐ - valuable paper.

Domain Tags:

- Atari game (Atari).
- Doom game (Doom).
- Starcraft game (Starcraft).
- Neural Networks & Optimizers (NN).
- Go game (Go).
- Table games (Table).
- Real-robot applications (Robot).
- Real/Simulated robotic locomotion (MuJoCo, Roboschool etc).
- Mazes and Labyrinths (Maze).
Multi - Multi-agent learning.
Continious - Methods with continious action space support.
Planning - Complex planning problems.
Transfer - Transfer learning.
RTS - Real-Time Strategy video game.
FPS - First-Person Shooter video game.

Deep Reinforcement Learning

Year 2018

🚀 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

[arXiv], [pdf]
Such et al.; Uber AI Labs
Atari, Maze

⭐ One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

[arXiv], [pdf]
Finn et al.; UC Berkeley
Robot, Meta-Learning

🚀 Regularized Evolution for Image Classifier Architecture Search

[arXiv], [pdf]
Real et al.; Google Brain
NN

Year 2017

🚀 Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

[arXiv], [pdf]
Such et al.; Uber AI Labs
Atari, Locomotion, Continuous

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

[arxiv], [pdf]
Silver et al.; DeepMind
Table

🚀 Rainbow: Combining Improvements in Deep Reinforcement Learning (DQN improvements combined)

[arXiv], [pdf]
Hessel et al.; Deepmind
Atari

⭐ Meta Learning Shared Hierarchies

[arXiv], [pdf], [official blog post]
Frans et al.; OpenAI, UC Berkeley
Locomotion, Continuous, Meta-Learning

One-Shot Visual Imitation Learning via Meta-Learning

[arXiv], [pdf]
Finn et al.; UC Berkeley, OpenAI
Robot, Continious, Meta-Learning

⭐ Learning with Opponent-Learning Awareness (LOLA)

[arXiv], [pdf], [official blog post]
Foerster et al.; OpenAI, University of Oxford, UC Berkeley, Carnegie Mellon University
Multi

🚀 Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR, A2C)

[arXiv], [pdf]
Wu et al.; University of Toronto, New York University
Atari, Locomotion, Continious

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

[arXiv], [pdf], [official blog post], [official code]
Nagabandi et al.; Berkeley
Locomotion, Continuous

🚀 Proximal Policy Optimization Algorithms (PPO)

[arXiv], [pdf], [official blog post]
Schulman et al.; OpenAI
Atari, Locomotion, Continious

🚀 Learning Transferable Architectures for Scalable Image Recognition

[arXiv], [pdf]
Zoph et al.; Google Brain
NN

⭐ Hybrid Reward Architecture for Reinforcement Learning (HRA)

[arXiv], [pdf]
van Seijen et al.; Microsoft Maluuba, McGill University
Atari

Parameter Space Noise for Exploration

[arXiv], [pdf]
Plappert et al.; OpenAI, Karlsruhe Institute of Technology
Atari, Locomotion, Continious

🚀 Mastering the Game of Go without Human Knowledge (AlphaGo Zero)

[pdf], [official blog post]
Silver et al.; Deepmind
Go, Table

Neural Optimizer Search with Reinforcement Learning

[pdf]
Bello et al.; Google Brain
NN

Asymmetric Actor Critic for Image-Based Robot Learning

[arXiv], [pdf], [official blog post]
Pinto et al.; OpenAI, CMU
Robot, Continous

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

[arXiv], [pdf], [official blog post]
Peng et al.; OpenAI, UC Berkeley
Robot, Continuous

A Deep Reinforcement Learning Chatbot

[arXiv], [pdf]
Serban et al.; MILA

Learning model-based planning from scratch

[arXiv], [pdf], [official blog post]
Pascanu et al.; Google DeepMind
Locomotion, Continious

⭐ Imagination-Augmented Agents for Deep Reinforcement Learning (I2As)

[arXiv], [pdf], [official blog post]
Weber et al.; DeepMind
Planning, Atari, Transfer

Distral: Robust Multitask Reinforcement Learning

[arXiv], [pdf]
Teh et al.; DeepMind
Maze, Transfer

Emergence of Locomotion Behaviours in Rich Environments

[arXiv], [pdf], [official blog post]
Heess et al.; DeepMind
Locomotion, Continious

Programmable Agents

[arXiv], [pdf]
Denil et al.; DeepMind
Locomotion, Continuous

⭐ Evolution Strategies as a Scalable Alternative to Reinforcement Learning

[arXiv], [pdf]
Salimans et al.; OpenAI
Atari

Neural Episodic Control

[arXiv], [pdf]
Pritzel et al.; DeepMind
Brief Summary. NEC agent is extremely data efficient. It's performance at 5 millions of frames can be reached by DQN with Prior. Replay only after 40 millions of frames. However, the final performance is still worse than the other state-of-the-art agents can obtain.
Atari

Year 2016

The Predictron: End-To-End Learning and Planning

[arXiv], [pdf]
Silver et al.; DeepMind
Maze, Planning

RL²: Fast Reinforcement Learning via Slow Reinforcement Learning

[arXiv], [pdf]
Duan et al.; Berkeley, OpenAI
Maze, Meta-Learning

Neural Architecture Search with Reinforcement Learning

[arXiv], [pdf]
B. Zoph and Quoc V. Le; Google Brain; ICLR.
NN

Reinforcement Learning with unsupervised auxiliary tasks (UNREAL)

[arXiv], [pdf]
Jaderberg et al.; Google DeepMind
📝 Notes
Atari, Maze, Locomotion, Continious

🚀 Learning to act by predicting the future (VizDoom 2016 Full DM Winner)

[arXiv], [pdf]
Dosovitskiy, Koltun; Intel Labs
Doom, Maze, FPS

Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games

[arXiv], [pdf]
Peng et al.; Alibaba Group, University College London
Starcraft, Multi

Playing FPS Games with Deep Reinforcement Learning (VizDoom 2016 Limited DM 2nd place)

[arXiv], [pdf]
Lample, Chaplot; Carnegie Mellon University
Doom, Maze, FPS

[RTS:SC] Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

[arXiv], [pdf]
Usunier et al.; Facebook AI Research
Starcraft

🚀 Asynchronous Methods for Deep Reinforcement Learning (A3C)

[arXiv], [pdf]
Mnih et al.; DeepMind
📝 Notes
Atari, Maze, Locomotion, Continious

Year 2015

⭐ Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN)

[arXiv], [pdf]
Wang et al.; DeepMind
Atari

Prioritized Experience Replay

[arXiv], [pdf]
Schaul et al.; DeepMind
📝 Notes
Atari

⭐ Deep Reinforcement Learning with Double Q-learning (Double DQN)

[arXiv], [pdf]
Hasselt et al.; DeepMind
Atari

High-dimensional continuous control using generalized advantage estimation

[arXiv], [pdf]
Schulman et al.; Berkeley
Locomotion, Continuous

⭐ Trust Region Policy Optimization (TRPO)

[arXiv], [pdf]
Schulman et al.; UC Berkeley
Atari, Maze, Locomotion, Continious

🚀 Human-level control through deep reinforcement learning (DQN)

[Nature], [reddit]
Mnih et al.; Google Deepmind
📝 Notes
Atari

Mastering the game of Go with deep neural networks and tree search (AlphaGo Master)

[Nature], [reddit]
Silver et al.; Deepmind, Google
Go, Table

Year 2013

🚀 Playing Atari with Deep Reinforcement Learning (DQN)

[arXiv], [pdf]
Mnih et al.; DeepMind Technologies
Atari

Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning

[pdf]
Koutnik et al.; IDSIA, USI-SUPSI

2012 and earlier

Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction

[pdf]
Sutton et al. (2011); University of Alberta, McGill University
Robot, Locomotion

⭐ Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion

[pdf]
Kohl and Stone (2004); The University of Texas at Austin
Robot, Locomotion

⭐ Autonomous helicopter flight via reinforcement learning

[pdf]
Ng et al. (2004); Stanford, Berkeley
Robot

⭐ Actor-Critic Algorithms

[pdf]
Konda and Tsitsiklis (2003)

⭐ Temporal Difference Learning and TD-Gammon

[pdf]
Gerald Tesauro (1995)
Table

⭐ Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning (REINFORCE)

[pdf]
Ronald J. Williams (1992); Northeastern University

Books

⭐ Reinforcement Learning: An Introduction (Complete Draft)

[pdf]
Richard S. Sutton and Andrew G. Barto (2018)

Miscellaneous

How to Read a Paper

[pdf]
S. Keshav (2007); University of Waterloo

Ava4wonder/reinforcement-learning-notes