Awesome Reinforcement Learning

This page is no longer maintained.

A curated list of resources dedicated to reinforcement learning.

We have pages for other topics: awesome-rnn, awesome-deep-vision, awesome-random-forest

Maintainers: Hyunsoo Kim, Jiwon Kim

Contributing

Please feel free to pull requests

Theory
Applications
Codes
Tutorials / Websites
Online Demos
Open Source Reinforcement Learning Platforms

Codes

Codes for examples and exercises in Richard Sutton and Andrew Barto's Reinforcement Learning: An Introduction
- Python Code
- MATLAB Code (BROKEN LINK)
- C/Lisp Code
- Julia Code
- Book
- Exercise Solutions
Simulation code for Reinforcement Learning Control Problems
- Pole-Cart Problem
- Q-learning Controller
MATLAB Environment and GUI for Reinforcement Learning
Reinforcement Learning Repository - University of Massachusetts, Amherst
Brown-UMBC Reinforcement Learning and Planning Library (Java)
Reinforcement Learning in R (MDP, Value Iteration)
Reinforcement Learning Environment in Python and MATLAB
RL-Glue (standard interface for RL) and RL-Glue Library
PyBrain Library - Python-Based Reinforcement learning, Artificial intelligence, and Neural network
RLPy Framework - Value-Function-Based Reinforcement Learning Framework for Education and Research
Maja - Machine learning framework for problems in Reinforcement Learning in python
TeachingBox - Java based Reinforcement Learning framework
Policy Gradient Reinforcement Learning Toolbox for MATLAB
PIQLE - Platform Implementing Q-Learning and other RL algorithms
BeliefBox - Bayesian reinforcement learning library and toolkit
Deep Q-Learning with TensorFlow - A deep Q learning demonstration using Google Tensorflow
Atari - Deep Q-networks and asynchronous agents in Torch
AgentNet - A python library for deep reinforcement learning and custom recurrent networks using Theano+Lasagne.
Reinforcement Learning Examples by RLCode - A Collection of minimal and clean reinforcement learning examples
OpenAI Baselines - Well tested implementations (and results) of reinforcement learning algorithms from OpenAI
PyTorch Deep RL - Popular deep RL algorithm implementations with PyTorch
ChainerRL - Popular deep RL algorithm implementations with Chainer
Black-DROPS - Modular and generic code for the model-based policy search Black-DROPS algorithm (IROS 2017 paper) and easy integration with the DART simulator
Gold - A reinforcement learning library for Golang.
Jumanji - A Suite of Industry-Driven Hardware-Accelerated RL Environments written in JAX.

Theory

Lectures

[DeepMind x UCL] Reinforcement Learning Lecture Series 2021
[UCL] COMPM050/COMPGI13 Reinforcement Learning by David Silver
[UCL] COMPMI22/COMPGI22 - Advanced Deep Learning and Reinforcement Learning
[UC Berkeley] CS188 Artificial Intelligence by Pieter Abbeel
[Udacity (Georgia Tech.)] CS7642 Reinforcement Learning
[Stanford] CS229 Machine Learning - Lecture 16: Reinforcement Learning by Andrew Ng
[UC Berkeley] Deep RL Bootcamp
[UC Berkeley] CS294 Deep Reinforcement Learning by John Schulman and Pieter Abbeel
[CMU] 10703: Deep Reinforcement Learning and Control, Spring 2017
[MIT] 6.S094: Deep Learning for Self-Driving Cars
- Lecture 2: Deep Reinforcement Learning for Motion Planning
[Siraj Raval]: Introduction to AI for Video Games (Reinforcement Learning Video Series)
[Mutual Information] Reinforcement Learning Fundamentals

Books

Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction (1st Edition, 1998) [Book] [Code]
Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction (2nd Edition, in progress, 2018) [Book] [Code]
Csaba Szepesvari, Algorithms for Reinforcement Learning [Book]
David Poole and Alan Mackworth, Artificial Intelligence: Foundations of Computational Agents [Book Chapter]
Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic Programming [Book (Amazon)] [Summary]
Mykel J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application [Book (Amazon)]
Deep Reinforcement Learning in Action [Book(Manning)]
REINFORCEMENT LEARNING AND OPTIMAL CONTROL Dimitri P. Bertsekas BOOK, VIDEOLECTURES, AND COURSE MATERIAL, 2019

Surveys

Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey (JAIR 1996) [Paper]
S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning (Sadhana 1994) [Paper]
Matthew E. Taylor, Peter Stone, Transfer Learning for Reinforcement Learning Domains: A Survey (JMLR 2009) [Paper]
Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement Learning in Robotics, A Survey (IJRR 2013) [Paper]
Michael L. Littman, Reinforcement learning improves behaviour from evaluative feedback (Nature 2015) [Paper]
Marc P. Deisenroth, Gerhard Neumann, Jan Peter, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics (2014) [Book]
Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, Anil Anthony Bharath, A Brief Survey of Deep Reinforcement Learning (IEEE Signal Processing Magazine 2017) [DOI] [Paper]
Benjamin Recht, A Tour of Reinforcement Learning: The View from Continuous Control (Annu. Rev. Control Robot. Auton. Syst. 2019) [DOI]

Papers / Thesis

Foundational Papers

Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [DOI] [Paper] (discusses issues in RL such as the "credit assignment problem")
Ian H. Witten, An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [DOI] [Paper] (earliest publication on temporal-difference (TD) learning rule)

Methods

Dynamic Programming (DP):
- Christopher J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis]
Monte Carlo:
- Andrew Barto, Michael Duff, Monte Carlo Inversion and Reinforcement Learning, NIPS, 1994. [Paper]
- Satinder P. Singh, Richard S. Sutton, Reinforcement Learning with Replacing Eligibility Traces, Machine Learning, 1996. [Paper]
Temporal-Difference:
- Richard S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988. [Paper]
Q-Learning (Off-policy TD algorithm):
- Chris Watkins, Learning from Delayed Rewards, Cambridge, 1989. [Thesis]
Sarsa (On-policy TD algorithm):
- G.A. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report, Cambridge Univ., 1994. [Report]
- Richard S. Sutton, Generalization in Reinforcement Learning: Successful examples using sparse coding, NIPS, 1996. [Paper]
R-Learning (learning of relative values)
- Andrew Schwartz, A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993. [Paper-Google Scholar]
Function Approximation methods (Least-Square Temporal Difference, Least-Square Policy Iteration)
- Steven J. Bradtke, Andrew G. Barto, Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper]
- Michail G. Lagoudakis, Ronald Parr, Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper] [Code]
Policy Search / Policy Gradient
- Richard Sutton, David McAllester, Satinder Singh, Yishay Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]
- Jan Peters, Sethu Vijayakumar, Stefan Schaal, Natural Actor-Critic, ECML, 2005. [Paper]
- Jens Kober, Jan Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]
- Jan Peters, Katharina Mulling, Yasemin Altun, Relative Entropy Policy Search, AAAI, 2010. [Paper]
- Freek Stulp, Olivier Sigaud, Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012. [Paper]
- Nate Kohl, Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper]
- Marc Deisenroth, Carl Rasmussen, PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]
- Scott Kuindersma, Roderic Grupen, Andrew Barto, Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
- Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik, Dorian Goepp, Vassilis Vassiliades, Jean-Baptiste Mouret, Black-Box Data-efficient Policy Search for Robotics, IROS, 2017. [Paper]
Hierarchical RL
- Richard Sutton, Doina Precup, Satinder Singh, Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper]
- George Konidaris, Andrew Barto, Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007. [Paper]
Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL)
- V. Mnih, et. al., Human-level Control through Deep Reinforcement Learning, Nature, 2015. [Paper]
- Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, Xiaoshi Wang, Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS, 2014. [Paper]
- Sergey Levine, Chelsea Finn, Trevor Darrel, Pieter Abbeel, End-to-End Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv]
- Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, ArXiv, 18 Nov 2015. [ArXiv]
- Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-Learning, ArXiv, 22 Sep 2015. [ArXiv]
- Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, ArXiv, 4 Feb 2016. [ArXiv]

Applications

Game Playing

Traditional Games

Backgammon - Gerald Tesauro, "TD-Gammon" game play using TD(λ) (ACM 1995) [Paper]
Chess - Jonathan Baxter, Andrew Tridgell and Lex Weaver, "KnightCap" program using TD(λ) (1999) [arXiv]
Chess - Matthew Lai, Giraffe: Using deep reinforcement learning to play chess (2015) [arXiv]

Computer Games

Atari 2600 Games - Volodymyr Mnih, Koray Kavukcuoglu, David Silver et al., Human-level Control through Deep Reinforcement Learning (Nature 2015) [DOI] [Paper] [Code] [Video]
Flappy Bird - Sarvagya Vaish, Flappy Bird Reinforcement Learning [Video]
Mario - Kenneth O. Stanley and Risto Miikkulainen, MarI/O - learning to play Mario with evolutionary reinforcement learning using artificial neural networks (Evolutionary Computation 2002) [Paper] [Video]
StarCraft II - Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning (Nature 2019) [DOI] [Paper] [Video]

Robotics

Nate Kohl and Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (ICRA 2004) [Paper]
Petar Kormushev, Sylvain Calinon and Darwin G. Caldwel, Robot Motor SKill Coordination with EM-based Reinforcement Learning (IROS 2010) [Paper] [Video]
Todd Hester, Michael Quinlan, and Peter Stone, Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (ICRA 2010) [Paper] [Video]
George Konidaris, Scott Kuindersma, Roderic Grupen and Andrew Barto, Autonomous Skill Acquisition on a Mobile Manipulator (AAAI 2011) [Paper] [Video]
Marc Peter Deisenroth and Carl Edward Rasmussen,PILCO: A Model-Based and Data-Efficient Approach to Policy Search (ICML 2011) [Paper]
Scott Niekum, Sachin Chitta, Bhaskara Marthi, et al., Incremental Semantically Grounded Learning from Demonstration (RSS 2013) [Paper]
Mark Cutler and Jonathan P. How, Efficient Reinforcement Learning for Robots using Informative Simulated Priors (ICRA 2015) [Paper] [Video]
Antoine Cully, Jeff Clune, Danesh Tarapore and Jean-Baptiste Mouret, Robots that can adapt like animals (Nature 2015) [ArXiv] [Video] [Code]
Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik et al, Black-Box Data-efficient Policy Search for Robotics (IROS 2017) [ArXiv] [Video] [Code]
P. Travis Jardine, Michael Kogan, Sidney N. Givigi and Shahram Yousefi, Adaptive predictive control of a differential drive robot tuned with reinforcement learning (Int J Adapt Control Signal Process 2019) [DOI]

Control

Pieter Abbeel, Adam Coates, et al., An Application of Reinforcement Learning to Aerobatic Helicopter Flight (NIPS 2006) [Paper] [Video]
J. Andrew Bagnell and Jeff G. Schneider, Autonomous helicopter control using Reinforcement Learning Policy Search Methods (ICRA 2001) [Paper]

Operations Research

Scott Proper and Prasad Tadepalli, Scaling Average-reward Reinforcement Learning for Product Delivery (AAAI 2004) [Paper]
Naoki Abe, Naval Verma et al., Cross Channel Optimized Marketing by Reinforcement Learning (KDD 2004) [Paper]
Bernd Waschneck, Andre Reichstaller, Lenz Belzner et al., Deep reinforcement learning for semiconductor production scheduling (ASMC 2018) [DOI] [Paper]

Human Computer Interaction

Satinder Singh, Diane Litman et al., Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System (JAIR 2002) [Paper]

Codes

Codes for examples and exercises in Richard Sutton and Andrew Barto's Book Reinforcement Learning: An Introduction
- Python Code (2nd Edition)
- MATLAB Code (1st Edition)
Simulation code for Reinforcement Learning Control Problems
- Pole-Cart Problem
- Q-learning Controller
MATLAB Environment and GUI for Reinforcement Learning
Reinforcement Learning Repository - University of Massachusetts, Amherst
Brown-UMBC Reinforcement Learning and Planning Library (Java)
Reinforcement Learning in R (MDP, Value Iteration)
Reinforcement Learning Environment in Python and MATLAB
RL-Glue (standard interface for RL) and RL-Glue Library
PyBrain Library - Python-Based Reinforcement learning, Artificial intelligence, and Neural network
RLPy Framework - Value-Function-Based Reinforcement Learning Framework for Education and Research
Maja - Machine learning framework for problems in Reinforcement Learning in python
TeachingBox - Java based Reinforcement Learning framework
Policy Gradient Reinforcement Learning Toolbox for MATLAB
PIQLE - Platform Implementing Q-Learning and other RL algorithms
BeliefBox - Bayesian reinforcement learning library and toolkit
Deep Q-Learning with TensorFlow - A deep Q learning demonstration using Google Tensorflow
Atari - Deep Q-networks and asynchronous agents in Torch
AgentNet - A python library for deep reinforcement learning and custom recurrent networks using Theano+Lasagne.
Reinforcement Learning Examples by RLCode - A Collection of minimal and clean reinforcement learning examples
OpenAI Baselines - Well tested implementations (and results) of reinforcement learning algorithms from OpenAI
PyTorch Deep RL - Popular deep RL algorithm implementations with PyTorch
ChainerRL - Popular deep RL algorithm implementations with Chainer
Black-DROPS - Modular and generic code for the model-based policy search Black-DROPS algorithm (IROS 2017 paper) and easy integration with the DART simulator
Jumanji - A Suite of Industry-Driven Hardware-Accelerated RL Environments written in JAX.

Tutorials / Websites

Mance Harmon and Stephanie Harmon, Reinforcement Learning: A Tutorial
C. Igel, M.A. Riedmiller, et al., Reinforcement Learning in a Nutshell, ESANN, 2007. [Paper]
UNSW - Reinforcement Learning
- Introduction
- TD-Learning
- Q-Learning and SARSA
- Applet for "Cat and Mouse" Game
ROS Reinforcement Learning Tutorial
POMDP for Dummies
Scholarpedia articles on:
- Reinforcement Learning
- Temporal Difference Learning
Repository with useful MATLAB Software, presentations, and demo videos
Bibliography on Reinforcement Learning
UC Berkeley - CS 294: Deep Reinforcement Learning, Fall 2015 (John Schulman, Pieter Abbeel) [Class Website]
Blog posts on Reinforcement Learning, Parts 1-4 by Travis DeWolf
The Arcade Learning Environment - Atari 2600 games environment for developing AI agents
Deep Reinforcement Learning: Pong from Pixels by Andrej Karpathy
Demystifying Deep Reinforcement Learning
Let’s make a DQN
Simple Reinforcement Learning with Tensorflow, Parts 0-8 by Arthur Juliani
Practical_RL - github-based course in reinforcement learning in the wild (lectures, coding labs, projects)
RLenv.directory: Explore and find new reinforcement learning environments.
Katja Hofmann's talk at NeurIPS '19 - RL: Past, Present and Future Perspectives
How to Structure, Organize, Track and Manage Reinforcement Learning (RL) Projects
Reinforcement Learning Cheat Sheet - A summary of some important concepts and algorithms in RL

Online Demos

Real-world demonstrations of Reinforcement Learning
Deep Q-Learning Demo - A deep Q learning demonstration using ConvNetJS
Deep Q-Learning with Tensor Flow - A deep Q learning demonstration using Google Tensorflow
Reinforcement Learning Demo - A reinforcement learning demo using reinforcejs by Andrej Karpathy

Open Source Reinforcement Learning Platforms

OpenAI gym - A toolkit for developing and comparing reinforcement learning algorithms
OpenAI universe - A software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications
DeepMind Lab - A customisable 3D platform for agent-based AI research
Project Malmo - A platform for Artificial Intelligence experimentation and research built on top of Minecraft by Microsoft
ViZDoom - Doom-based AI research platform for reinforcement learning from raw visual information
Retro Learning Environment - An AI platform for reinforcement learning based on video game emulators. Currently supports SNES and Sega Genesis. Compatible with OpenAI gym.
torch-twrl - A package that enables reinforcement learning in Torch by Twitter
UETorch - A Torch plugin for Unreal Engine 4 by Facebook
TorchCraft - Connecting Torch to StarCraft
garage - A framework for reproducible reinformcement learning research, fully compatible with OpenAI Gym and DeepMind Control Suite (successor to rllab)
TensorForce - Practical deep reinforcement learning on TensorFlow with Gitter support and OpenAI Gym/Universe/DeepMind Lab integration.
tf-TRFL - A library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Learning agents.
OpenAI lab - An experimentation system for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
keras-rl - State-of-the art deep reinforcement learning algorithms in Keras designed for compatibility with OpenAI.
BURLAP - Brown-UMBC Reinforcement Learning and Planning, a library written in Java
MAgent - A Platform for Many-agent Reinforcement Learning.
Ray RLlib - Ray RLlib is a reinforcement learning library that aims to provide both performance and composability.
SLM Lab - A research framework for Deep Reinforcement Learning using Unity, OpenAI Gym, PyTorch, Tensorflow.
Unity ML Agents - Create reinforcement learning environments using the Unity Editor
Intel Coach - Coach is a python reinforcement learning research framework containing implementation of many state-of-the-art algorithms.
Microsoft AirSim - Open source simulator based on Unreal Engine for autonomous vehicles from Microsoft AI & Research.
DI-engine - DI-engine is a generalized Decision Intelligence engine. It supports most basic deep reinforcement learning (DRL) algorithms, such as DQN, PPO, SAC, and domain-specific algorithms like QMIX in multi-agent RL, GAIL in inverse RL, and RND in exploration problems.
Jumanji - A Suite of Industry-Driven Hardware-Accelerated RL Environments written in JAX.

Yidan-Zhang/awesome-rl

Awesome Reinforcement Learning

Contributing

Table of Contents

Codes

Theory

Lectures

Books

Surveys

Papers / Thesis

Applications

Game Playing

Robotics

Control

Operations Research

Human Computer Interaction

Codes

Tutorials / Websites

Online Demos

Open Source Reinforcement Learning Platforms

valuable Contributors👩‍💻👨‍💻 :