/reinforcement-learning-notes

Survey on (Deep) Reinforcement Learning papers and algorithms

MIT LicenseMIT

Click here to see icon descriptions.
  • 🚀 - state-of-the-art generic agent/technique at the moment of paper publication.
  • ⭐ - valuable paper in some specific domain.
  • model - Model-based RL (Model-based).
  • marl - Multi-Agent RL (MARL).
  • sp - Self-Play.
  • evo - Evolutionary & Genetic Algorithms (Evolution).
  • generalization - Generalization across environments (Generalization).
  • nn - Neural Networks & Optimizers (NN).
  • robot - Manipulation tasks (Manipulator).
  • loco - Locomotion: MuJoCo, Roboschool, etc (Locomotion)
  • navi - Navigation tasks (Navigation).
  • plan - Strategy Planning Problems (Planning).
  • transfer - Transfer learning (Transfer).
  • irl - Inverse Reinforcement Learning (IRL)
  • meta - Meta-Learning
  • sparse - Sparse Reward Problems and/or Montezuma's Revenge (Sparse)
  • atari - Atari game (Atari).
  • table - Table games (Table).
  • doom - Doom game (Doom).
  • sc - Starcraft game (Starcraft).
  • go - Go game (Go).

Deep Reinforcement Learning

A constantly evolving list of Reinforcement Learning papers, notes, books, implementations etc.

RL Frameworks/Implementations

Baselines @ OpenAI

Dopamine @ Google

TensorForce

RL Agents benchmarks

Benchmarks for: PPO, A2C, ACKTR, ACER

Benchmarks for: Rainbow, c51, IQN, DQN

Benchmarks for: Vanilla DQN, Double DQN, Dueling DQN, Prioritized DQN

Papers

Year 2019

🚀 Go-Explore: a New Approach for Hard-Exploration Problems

  • [arXiv] Ecoffet et al., 2019; Uber AI Labs
  • sparse Sparse

Year 2018

Exploration by Random Network Distillation (RND)

SFV: Reinforcement Learning of Physical Skills from Videos

  • [arXiv] [blog] Peng et al., 2018; Berkeley
  • meta irl Meta-Learning IRL

Large-Scale Study of Curiosity-Driven Learning

  • [blog] [pdf] Pathak et al., 2018; OpenAI, Berkeley, Univ. of Edinburgh
  • navi sparse Navigation, Sparse

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

  • [arXiv] [blog] Jaderberg et al., 2018; Google DeepMind
  • navi Navigation

Evolving Multimodal Robot Behavior via Many Stepping Stones with the Combinatorial Multi-Objective Evolutionary Algorithm

  • [arXiv] Huizinga & Clune, 2018; University of Wyoming
  • meta Meta-Learning

Learning Dexterous In-Hand Manipulation

  • [arXiv] [blog] Andrychowicz et al.; OpenAI
  • generalization robot Generalization, Manipulator

RUDDER: Return Decomposition for Delayed Rewards

  • [arXiv] [code] Arjona-Medina et al.; Johannes Kepler University Linz
  • sparse atari Sparse, Atari

Relational Deep Reinforcement Learning

  • [arXiv] Zambaldi et al.; Google Deepmind
  • plan sc Planning, Starcraft

Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems

  • [arXiv] Stanton and Clune; University of Wyoming
  • sparse Sparse

AutoAugment: Learning Augmentation Policies from Data

  • [arXiv] Cubuk et al.; Google Brain
  • nn NN

Playing Atari with Six Neurons

  • [arXiv] Cucci et al.; University of Fribourg, NYU
  • atari Atari

World Models

  • [arXiv] [blog] Ha and Schmidhuber; IDSIA, Google Brain, NNAISENSE
  • model doom Model-based, Doom

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

  • [arXiv] Chrabaszcz et al.; University of Freiburg
  • evo atari Evolution, Atari

🚀 Implicit Quantile Networks for Distributional Reinforcement Learning (IQN)

  • [arXiv] Dabney et al., 2018; Google Deepmind
  • atari Atari

🚀 A Distributional Perspective on Reinforcement Learning (c51)

  • [arXiv] Bellemare et al., 2018; Google Deepmind
  • atari Atari

🚀 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

  • [arXiv] Such et al.; Uber AI Labs
  • atari navi Atari, Navigation

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

  • [arXiv] Finn et al.; UC Berkeley
  • irl robot IRL, Manipulator

Regularized Evolution for Image Classifier Architecture Search

  • [arXiv] Real et al.; Google Brain
  • evo nn Evo, NN

Building Generalizable Agents with a Realistic and Rich 3D Environment

  • [arXiv] Wu et al., 2018; Berkeley, FAIR
  • generalization navi Generalization, Navigation

Year 2017

Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

  • [arXiv] Such et al.; Uber AI Labs
  • loco atari Locomotion, Atari

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

  • [arxiv] Silver et al.; Google Deepmind
  • sp plan table Self-Play, Planning, Table

🚀 Rainbow: Combining Improvements in Deep Reinforcement Learning (DQN improvements combined)

  • [arXiv] Hessel et al.; Google Deepmind
  • atari Atari

Meta Learning Shared Hierarchies

  • [arXiv] [blog] Frans et al.; OpenAI, Berkeley.
  • meta loco Locomotion, Meta-Learning

One-Shot Visual Imitation Learning via Meta-Learning

  • [arXiv] [pdf] Finn et al.; UC Berkeley, OpenAI
  • irl meta robot IRL, Meta-Learning, Manipulator

Learning with Opponent-Learning Awareness (LOLA)

  • [arXiv] [blog] Foerster et al.; OpenAI, Oxford, Berkeley, CMU
  • marl MARL

🚀 Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR, A2C)

  • [arXiv] Wu et al.; University of Toronto, New York University
  • loco atari Locomotion, Atari

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

🚀 Proximal Policy Optimization Algorithms (PPO)

  • [arXiv] [blog] Schulman et al.; OpenAI
  • atari loco Locomotion, Atari

Learning Transferable Architectures for Scalable Image Recognition

  • [arXiv] Zoph et al.; Google Brain
  • nn NN

Hybrid Reward Architecture for Reinforcement Learning (HRA)

  • [arXiv] van Seijen et al.; Microsoft Maluuba, McGill University
  • meta atari Meta-Learning, Atari

Parameter Space Noise for Exploration

  • [arXiv] Plappert et al.; OpenAI, Karlsruhe Institute of Technology
  • loco atari Locomotion, Atari

Mastering the Game of Go without Human Knowledge (AlphaGo Zero)

  • [pdf], [blog] Silver et al.; Deepmind
  • sp plan go table Self-Play, Planning, Go, Table

Neural Optimizer Search with Reinforcement Learning

  • [pdf] Bello et al.; Google Brain
  • nn NN

Asymmetric Actor Critic for Image-Based Robot Learning

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

  • [arXiv], [blog] Peng et al.; OpenAI, Berkeley
  • ![gener] robot Generalization, Manipulator

A Deep Reinforcement Learning Chatbot

  • [arXiv] Serban et al.; MILA

Learning model-based planning from scratch

  • [arXiv], [blog] Pascanu et al.; Google DeepMind
  • model loco Model-based, Locomotion

Imagination-Augmented Agents for Deep Reinforcement Learning (I2As)

  • [arXiv] [blog] Weber et al.; DeepMind
  • plan transfer atari Planning, Transfer, Atari

Distral: Robust Multitask Reinforcement Learning

  • [arXiv] Teh et al.; Google Deepmind
  • transfer navi Transfer, Navigation

Emergence of Locomotion Behaviours in Rich Environments

  • [arXiv] [blog] Heess et al.; DeepMind
  • loco Locomotion

Programmable Agents

  • [arXiv] Denil et al.; Google Deepmind
  • loco Locomotion

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

  • [arXiv] Salimans et al.; OpenAI
  • atari Atari

Neural Episodic Control

  • [arXiv] Pritzel et al.; Google Deepmind
  • atari Atari

Year 2016

The Predictron: End-To-End Learning and Planning

  • [arXiv] Silver et al.; Google Deepmind
  • model plan navi Model-based, Planning, Navigation

RL2: Fast Reinforcement Learning via Slow Reinforcement Learning

  • [arXiv] Duan et al.; Berkeley, OpenAI
  • meta navi Meta-Learning, Navigation

Neural Architecture Search with Reinforcement Learning

  • [arXiv] B. Zoph and Quoc V. Le; Google Brain; ICLR.
  • nn NN

Learning to Navigate in Complex Environments

  • [arXiv] Mirowski et al., 2016; Google Deepmind
  • navi Navigation

Reinforcement Learning with unsupervised auxiliary tasks (UNREAL)

  • [arXiv] Jaderberg et al.; Google DeepMind
  • 📝 Notes
  • loco atari navi Locomotion, Atari, Navigation

🚀 Learning to act by predicting the future (VizDoom 2016 Full DM Winner)

  • [arXiv] Dosovitskiy, Koltun; Intel Labs
  • navi doom Navigation, Doom

Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games

  • [arXiv] Peng et al.; Alibaba Group, University College London
  • marl sc MARL, Starcraft

Playing FPS Games with Deep Reinforcement Learning (VizDoom 2016 Limited DM 2nd place)

  • [arXiv] Lample, Chaplot; Carnegie Mellon University
  • navi doom Navigation, Doom

[RTS:SC] Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

  • [arXiv] Usunier et al.; Facebook AI Research
  • sc Starcraft

🚀 Asynchronous Methods for Deep Reinforcement Learning (A3C)

  • [arXiv] Mnih et al.; Google Deepmind
  • 📝 Notes
  • loco atari navi Locomotion, Atari, Navigation

Year 2015

🚀 Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN)

  • [arXiv] Wang et al.; Google Deepmind
  • atari Atari

🚀 Prioritized Experience Replay

  • [arXiv] Schaul et al.; Google Deepmind
  • 📝 Notes
  • atari Atari

🚀 Deep Reinforcement Learning with Double Q-learning (Double DQN)

  • [arXiv] Hasselt et al.; Google Deepmind
  • atari Atari

High-dimensional continuous control using generalized advantage estimation

  • [arXiv] Schulman et al.; Berkeley
  • loco Locomotion

Trust Region Policy Optimization (TRPO)

  • [arXiv] Schulman et al.; UC Berkeley
  • atari navi loco Atari, Navigation, Locomotion

🚀 Human-level control through deep reinforcement learning (DQN)

  • [Nature] [pdf] Mnih et al.; Google Deepmind
  • 📝 Notes
  • atari Atari

Mastering the game of Go with deep neural networks and tree search (AlphaGo Master)

  • [Nature], [reddit] Silver et al.; Deepmind, Google
  • sp plan go table Self-Play, Planning, Go, Table

Year 2013

🚀 Playing Atari with Deep Reinforcement Learning (DQN)

  • [arXiv] Mnih et al.; DeepMind Technologies
  • atari Atari

Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning

  • [pdf] Koutnik et al.; IDSIA, USI-SUPSI
  • evo Evolution

2012 and earlier

Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction

  • [pdf] Sutton et al. (2011); University of Alberta, McGill University
  • robot loco Manipulator, Locomotion

Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion

  • [pdf] Kohl and Stone (2004); The University of Texas at Austin
  • robot loco Manipulator, Locomotion

Autonomous helicopter flight via reinforcement learning

  • [pdf] Ng et al. (2004); Stanford, Berkeley
  • robot Manipulator

Actor-Critic Algorithms

  • [pdf] Konda and Tsitsiklis (2003)

Temporal Difference Learning and TD-Gammon

  • [pdf] Gerald Tesauro (1995)
  • table Table

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning (REINFORCE)

  • [pdf] Ronald J. Williams (1992); Northeastern University

Surveys

A Brief Survey of Deep Reinforcement Learning

  • [arXiv] Arulkumaran et al (2017).

Books

Reinforcement Learning: An Introduction (Complete Draft)

  • [pdf] Richard S. Sutton and Andrew G. Barto (2018)

Miscellaneous

How to Read a Paper

  • [pdf] S. Keshav (2007); University of Waterloo

ArXiv Sanity Preserver: A recommender system for searching papers that are published on arXiv.

GitXiv: A recommender system for searching papers and their supplementary materials (if available).