A constantly evolving list of Reinforcement Learning papers, notes, books etc.
Glossary:
- 🚀 - state-of-the-art method in current domain at the moment of paper publication.
- ⭐ - valuable paper.
Domain Tags:
- - Atari game (Atari).
- - Doom game (Doom).
- - Starcraft game (Starcraft).
- - Neural Networks & Optimizers (NN).
- - Go game (Go).
- - Table games (Table).
- - Real-robot applications (Robot).
- - Real/Simulated robotic locomotion (MuJoCo, Roboschool etc).
- - Mazes and Labyrinths (Maze).
- Multi - Multi-agent learning.
- Continious - Methods with continious action space support.
- Planning - Complex planning problems.
- Transfer - Transfer learning.
- RTS - Real-Time Strategy video game.
- FPS - First-Person Shooter video game.
🚀 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
⭐ One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
🚀 Regularized Evolution for Image Classifier Architecture Search
🚀 Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
🚀 Rainbow: Combining Improvements in Deep Reinforcement Learning (DQN improvements combined)
⭐ Meta Learning Shared Hierarchies
- [arXiv], [pdf], [official blog post]
- Frans et al.; OpenAI, UC Berkeley
- Locomotion, Continuous, Meta-Learning
One-Shot Visual Imitation Learning via Meta-Learning
⭐ Learning with Opponent-Learning Awareness (LOLA)
- [arXiv], [pdf], [official blog post]
- Foerster et al.; OpenAI, University of Oxford, UC Berkeley, Carnegie Mellon University
- Multi
🚀 Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR, A2C)
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
- [arXiv], [pdf], [official blog post], [official code]
- Nagabandi et al.; Berkeley
- Locomotion, Continuous
🚀 Proximal Policy Optimization Algorithms (PPO)
- [arXiv], [pdf], [official blog post]
- Schulman et al.; OpenAI
- Atari, Locomotion, Continious
🚀 Learning Transferable Architectures for Scalable Image Recognition
⭐ Hybrid Reward Architecture for Reinforcement Learning (HRA)
Parameter Space Noise for Exploration
- [arXiv], [pdf]
- Plappert et al.; OpenAI, Karlsruhe Institute of Technology
- Atari, Locomotion, Continious
🚀 Mastering the Game of Go without Human Knowledge (AlphaGo Zero)
- [pdf], [official blog post]
- Silver et al.; Deepmind
- Go, Table
Neural Optimizer Search with Reinforcement Learning
- [pdf]
- Bello et al.; Google Brain
- NN
Asymmetric Actor Critic for Image-Based Robot Learning
- [arXiv], [pdf], [official blog post]
- Pinto et al.; OpenAI, CMU
- Robot, Continous
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
- [arXiv], [pdf], [official blog post]
- Peng et al.; OpenAI, UC Berkeley
- Robot, Continuous
A Deep Reinforcement Learning Chatbot
Learning model-based planning from scratch
- [arXiv], [pdf], [official blog post]
- Pascanu et al.; Google DeepMind
- Locomotion, Continious
⭐ Imagination-Augmented Agents for Deep Reinforcement Learning (I2As)
- [arXiv], [pdf], [official blog post]
- Weber et al.; DeepMind
- Planning, Atari, Transfer
Distral: Robust Multitask Reinforcement Learning
Emergence of Locomotion Behaviours in Rich Environments
- [arXiv], [pdf], [official blog post]
- Heess et al.; DeepMind
- Locomotion, Continious
Programmable Agents
⭐ Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Neural Episodic Control
- [arXiv], [pdf]
- Pritzel et al.; DeepMind
- Brief Summary. NEC agent is extremely data efficient. It's performance at 5 millions of frames can be reached by DQN with Prior. Replay only after 40 millions of frames. However, the final performance is still worse than the other state-of-the-art agents can obtain.
- Atari
The Predictron: End-To-End Learning and Planning
RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
Neural Architecture Search with Reinforcement Learning
Reinforcement Learning with unsupervised auxiliary tasks (UNREAL)
🚀 Learning to act by predicting the future (VizDoom 2016 Full DM Winner)
Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games
Playing FPS Games with Deep Reinforcement Learning (VizDoom 2016 Limited DM 2nd place)
[RTS:SC] Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks
🚀 Asynchronous Methods for Deep Reinforcement Learning (A3C)
⭐ Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN)
Prioritized Experience Replay
⭐ Deep Reinforcement Learning with Double Q-learning (Double DQN)
High-dimensional continuous control using generalized advantage estimation
⭐ Trust Region Policy Optimization (TRPO)
🚀 Human-level control through deep reinforcement learning (DQN)
Mastering the game of Go with deep neural networks and tree search (AlphaGo Master)
🚀 Playing Atari with Deep Reinforcement Learning (DQN)
Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning
- [pdf]
- Koutnik et al.; IDSIA, USI-SUPSI
Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction
- [pdf]
- Sutton et al. (2011); University of Alberta, McGill University
- Robot, Locomotion
⭐ Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion
- [pdf]
- Kohl and Stone (2004); The University of Texas at Austin
- Robot, Locomotion
⭐ Autonomous helicopter flight via reinforcement learning
- [pdf]
- Ng et al. (2004); Stanford, Berkeley
- Robot
⭐ Actor-Critic Algorithms
- [pdf]
- Konda and Tsitsiklis (2003)
⭐ Temporal Difference Learning and TD-Gammon
- [pdf]
- Gerald Tesauro (1995)
- Table
⭐ Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning (REINFORCE)
- [pdf]
- Ronald J. Williams (1992); Northeastern University
⭐ Reinforcement Learning: An Introduction (Complete Draft)
- [pdf]
- Richard S. Sutton and Andrew G. Barto (2018)
How to Read a Paper
- [pdf]
- S. Keshav (2007); University of Waterloo