Awesome Reinforcement Learning

A curated list of resources dedicated to reinforcement learning.

We have pages for other topics: awesome-rnn, awesome-deep-vision, awesome-random-forest

Maintainers:Zhimin Hou, Hyunsoo Kim, Jiwon Kim

We are looking for more contributors and maintainers!

Contributing

Please feel free to pull requests

Codes
Theory
Applications
Tutorials / Websites
Online Demos
Open Source Reinforcement Learning Platforms

Codes

Codes for examples and exercises in Richard Sutton and Andrew Barto's Reinforcement Learning: An Introduction
Python Code
MATLAB Code
C/Lisp Code
Book
Simulation code for Reinforcement Learning Control Problems
Pole-Cart Problem
Q-learning Controller
MATLAB Environment and GUI for Reinforcement Learning
Reinforcement Learning Repository - University of Massachusetts, Amherst
Brown-UMBC Reinforcement Learning and Planning Library (Java)
Reinforcement Learning in R (MDP, Value Iteration)
Reinforcement Learning Environment in Python and MATLAB
RL-Glue (standard interface for RL) and RL-Glue Library
PyBrain Library - Python-Based Reinforcement learning, Artificial intelligence, and Neural network
RLPy Framework - Value-Function-Based Reinforcement Learning Framework for Education and Research
Maja - Machine learning framework for problems in Reinforcement Learning in python
TeachingBox - Java based Reinforcement Learning framework
Policy Gradient Reinforcement Learning Toolbox for MATLAB
PIQLE - Platform Implementing Q-Learning and other RL algorithms
BeliefBox - Bayesian reinforcement learning library and toolkit
Deep Q-Learning with Tensor Flow - A deep Q learning demonstration using Google Tensorflow
Atari - Deep Q-networks and asynchronous agents in Torch
AgentNet - A python library for deep reinforcement learning and custom recurrent networks using Theano+Lasagne.
Reinforcement Learning Examples by RLCode - A Collection of minimal and clean reinforcement learning examples
PyTorch Deep RL - Popular deep RL algorithm implementations with PyTorch
Black-DROPS - Modular and generic code for the model-based policy search Black-DROPS algorithm (IROS 2017 paper) and easy integration with the DART simulator
Browse state-of-the-art - all the state-of-art projects include CV, NLP, Robotics
Code for real robot examples - implemented in real physical robots
Benchmarking of model-based reinforcement learning - all the model-based reinforcement learning algorithms

Theory

Blog

[Paper reading] endtoendAI

Lectures

[UCL] COMPM050/COMPGI13 Reinforcement Learning by David Silver
[UC Berkeley] CS188 Artificial Intelligence by Pieter Abbeel
[Udacity (Georgia Tech.)] CS7642 Reinforcement Learning
[Stanford] CS229 Machine Learning - Lecture 16: Reinforcement Learning by Andrew Ng
[UC Berkeley] Deep RL Bootcamp
[UC Berkeley] CS294 Deep Reinforcement Learning by John Schulman and Pieter Abbeel
[CMU] 10703: Deep Reinforcement Learning and Control, Spring 2017
[MIT] 6.S094: Deep Learning for Self-Driving Cars
- Lecture 2: Deep Reinforcement Learning for Motion Planning
[DLRL summer school] [Lectures for deep learning and reinforcement learning]
[CMU] Domain knowledge intergrate into deep learning process
[Chinese version].Intro to Reinforcement Learning (强化学习纲要)

Books

Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction (1st Edition, 1998) [Book] [Code]
Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction (2nd Edition, in progress, 2018) [Book] [Code]
Csaba Szepesvari, Algorithms for Reinforcement Learning [Book]
David Poole and Alan Mackworth, Artificial Intelligence: Foundations of Computational Agents [Book Chapter]
Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic Programming [Book (Amazon)] [Summary]
Mykel J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application [Book (Amazon)]

Surveys

Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey, JAIR, 1996. [Paper]
S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning, Sadhana, 1994. [Paper]
Matthew E. Taylor, Peter Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR, 2009. [Paper]
Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement Learning in Robotics, A Survey, IJRR, 2013. [Paper]
Michael L. Littman, "Reinforcement learning improves behaviour from evaluative feedback." Nature 521.7553 (2015): 445-451. [Paper]
Marc P. Deisenroth, Gerhard Neumann, Jan Peter, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, 2014. [Book]

Papers / Thesis

Foundational Papers

Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [Paper] (discusses issues in RL such as the "credit assignment problem")
Ian H. Witten, An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [Paper] (earliest publication on temporal-difference (TD) learning rule)

Methods

Dynamic Programming (DP):
- Christopher J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis]
Monte Carlo:
- Andrew Barto, Michael Duff, Monte Carlo Inversion and Reinforcement Learning, NIPS, 1994. [Paper]
- Satinder P. Singh, Richard S. Sutton, Reinforcement Learning with Replacing Eligibility Traces, Machine Learning, 1996. [Paper]
Temporal-Difference:
- Richard S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988. [Paper]
Q-Learning (Off-policy TD algorithm):
- Chris Watkins, Learning from Delayed Rewards, Cambridge, 1989. [Thesis]
Sarsa (On-policy TD algorithm):
- G.A. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report, Cambridge Univ., 1994. [Report]
- Richard S. Sutton, Generalization in Reinforcement Learning: Successful examples using sparse coding, NIPS, 1996. [Paper]
R-Learning (learning of relative values)
- Andrew Schwartz, A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993. [Paper-Google Scholar]
Function Approximation methods (Least-Square Temporal Difference, Least-Square Policy Iteration)
- Steven J. Bradtke, Andrew G. Barto, Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper]
- Michail G. Lagoudakis, Ronald Parr, Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper] [Code]
Policy Search / Policy Gradient
- Richard Sutton, David McAllester, Satinder Singh, Yishay Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]
- Jan Peters, Sethu Vijayakumar, Stefan Schaal, Natural Actor-Critic, ECML, 2005. [Paper]
- Jens Kober, Jan Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]
- Jan Peters, Katharina Mulling, Yasemin Altun, Relative Entropy Policy Search, AAAI, 2010. [Paper]
- Freek Stulp, Olivier Sigaud, Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012. [Paper]
- Nate Kohl, Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper]
- Marc Deisenroth, Carl Rasmussen, PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]
- Scott Kuindersma, Roderic Grupen, Andrew Barto, Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
- Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik, Dorian Goepp, Vassilis Vassiliades, Jean-Baptiste Mouret, Black-Box Data-efficient Policy Search for Robotics, IROS, 2017. [Paper]
Hierarchical RL
- Richard Sutton, Doina Precup, Satinder Singh, Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper]
- George Konidaris, Andrew Barto, Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007. [Paper]
Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL)
- V. Mnih, et. al., Human-level Control through Deep Reinforcement Learning, Nature, 2015. [Paper]
- Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, Xiaoshi Wang, Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS, 2014. [Paper]
- Sergey Levine, Chelsea Finn, Trevor Darrel, Pieter Abbeel, End-to-End Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv]
- Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, ArXiv, 18 Nov 2015. [ArXiv]
- Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-Learning, ArXiv, 22 Sep 2015. [ArXiv]
- Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, ArXiv, 4 Feb 2016. [ArXiv]
- Kalman Filter
- The basic lecture

Applications

Game Playing

Traditional Games

Backgammon - "TD-Gammon" game play using TD(λ) (Tesauro, ACM 1995) [Paper]
Chess - "KnightCap" program using TD(λ) (Baxter, arXiv 1999) [arXiv]
Chess - Giraffe: Using deep reinforcement learning to play chess (Lai, arXiv 2015) [arXiv]

Computer Games

Human-level Control through Deep Reinforcement Learning (Mnih, Nature 2015) [Paper] [Code] [Video]
Flappy Bird Reinforcement Learning [Video]
MarI/O - learning to play Mario with evolutionary reinforcement learning using artificial neural networks (Stanley, Evolutionary Computation 2002) [Paper] [Video]

Robotics

Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (Kohl, ICRA 2004) [Paper]
Robot Motor SKill Coordination with EM-based Reinforcement Learning (Kormushev, IROS 2010) [Paper] [Video]
Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (Hester, ICRA 2010) [Paper] [Video]
Autonomous Skill Acquisition on a Mobile Manipulator (Konidaris, AAAI 2011) [Paper] [Video]
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (Deisenroth, ICML 2011) [Paper]
Incremental Semantically Grounded Learning from Demonstration (Niekum, RSS 2013) [Paper]
Efficient Reinforcement Learning for Robots using Informative Simulated Priors (Cutler, ICRA 2015) [Paper] [Video]
Robots that can adapt like animals (Cully, Nature 2015) [Paper] [Video] [Code]
Black-Box Data-efficient Policy Search for Robotics (Chatzilygeroudis, IROS 2017) [Paper] [Video] [Code]
Model-driven DDPG with fuzzy reward signals for robotic peg-in-hole assembly(IEEE Transactions on Industrial Informatics) [Paper]

Control

An Application of Reinforcement Learning to Aerobatic Helicopter Flight (Abbeel, NIPS 2006) [Paper] [Video]
Autonomous helicopter control using Reinforcement Learning Policy Search Methods (Bagnell, ICRA 2001) [Paper]

Operations Research

Scaling Average-reward Reinforcement Learning for Product Delivery (Proper, AAAI 2004) [Paper]
Cross Channel Optimized Marketing by Reinforcement Learning (Abe, KDD 2004) [Paper]

Human Computer Interaction

Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System (Singh, JAIR 2002) [Paper]

Tutorials / Websites

Mance Harmon and Stephanie Harmon, Reinforcement Learning: A Tutorial
C. Igel, M.A. Riedmiller, et al., Reinforcement Learning in a Nutshell, ESANN, 2007. [Paper]
UNSW - Reinforcement Learning
Introduction
TD-Learning
Q-Learning and SARSA
Applet for "Cat and Mouse" Game
ROS Reinforcement Learning Tutorial
POMDP for Dummies
Scholarpedia articles on:
Reinforcement Learning
Temporal Difference Learning
Repository with useful MATLAB Software, presentations, and demo videos
Bibliography on Reinforcement Learning
UC Berkeley - CS 294: Deep Reinforcement Learning, Fall 2015 (John Schulman, Pieter Abbeel) [Class Website]
Blog posts on Reinforcement Learning, Parts 1-4 by Travis DeWolf
The Arcade Learning Environment - Atari 2600 games environment for developing AI agents
Deep Reinforcement Learning: Pong from Pixels by Andrej Karpathy
Demystifying Deep Reinforcement Learning
Let’s make a DQN
Simple Reinforcement Learning with Tensorflow, Parts 0-8 by Arthur Juliani
Practical_RL - github-based course in reinforcement learning in the wild (lectures, coding labs, projects)
Principles of Deep RL by David Silver (Deep Learning Indaba)
Success Stories of Deep RL by David Silver (Deep leanring Indaba)

Online Demos

Real-world demonstrations of Reinforcement Learning
Deep Q-Learning Demo - A deep Q learning demonstration using ConvNetJS
Deep Q-Learning with Tensor Flow - A deep Q learning demonstration using Google Tensorflow
Reinforcement Learning Demo - A reinforcement learning demo using reinforcejs by Andrej Karpathy

Open Source Reinforcement Learning Platforms

OpenAI gym - A toolkit for developing and comparing reinforcement learning algorithms
OpenAI universe - A software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications
DeepMind Lab - A customisable 3D platform for agent-based AI research
Project Malmo - A platform for Artificial Intelligence experimentation and research built on top of Minecraft by Microsoft
ViZDoom - Doom-based AI research platform for reinforcement learning from raw visual information
Retro Learning Environment - An AI platform for reinforcement learning based on video game emulators. Currently supports SNES and Sega Genesis. Compatible with OpenAI gym.
torch-twrl - A package that enables reinforcement learning in Torch by Twitter
UETorch - A Torch plugin for Unreal Engine 4 by Facebook
TorchCraft - Connecting Torch to StarCraft
rllab - A framework for developing and evaluating reinforcement learning algorithms, fully compatible with OpenAI Gym
TensorForce - Practical deep reinforcement learning on TensorFlow with Gitter support and OpenAI Gym/Universe/DeepMind Lab integration.
OpenAI lab - An experimentation system for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
keras-rl - State-of-the art deep reinforcement learning algorithms in Keras designed for compatibility with OpenAI.
BURLAP - Brown-UMBC Reinforcement Learning and Planning, a library written in Java
MAgent - A Platform for Many-agent Reinforcement Learning.
Ray RLlib - Ray RLlib is a reinforcement learning library that aims to provide both performance and composability.
MIT Autonomous Driving Lab - This include a deep traffic and deep learning for autonomous driving.

hzm2016/sources-of-reinforcement-learning