A curated list of resources dedicated to reinforcement learning.
We have pages for other topics: awesome-rnn, awesome-deep-vision, awesome-random-forest
Maintainers: Hyunsoo Kim, Jiwon Kim
We are looking for more contributors and maintainers!
Please feel free to pull requests
- Codes
- Theory
- Applications
- Tutorials / Websites
- Online Demos
- Open Source Reinforcement Learning Platforms
- Codes for examples and exercises in Richard Sutton and Andrew Barto's Reinforcement Learning: An Introduction
- Simulation code for Reinforcement Learning Control Problems
- MATLAB Environment and GUI for Reinforcement Learning
- Reinforcement Learning Repository - University of Massachusetts, Amherst
- Brown-UMBC Reinforcement Learning and Planning Library (Java)
- Reinforcement Learning in R (MDP, Value Iteration)
- Reinforcement Learning Environment in Python and MATLAB
- RL-Glue (standard interface for RL) and RL-Glue Library
- PyBrain Library - Python-Based Reinforcement learning, Artificial intelligence, and Neural network
- RLPy Framework - Value-Function-Based Reinforcement Learning Framework for Education and Research
- Maja - Machine learning framework for problems in Reinforcement Learning in python
- TeachingBox - Java based Reinforcement Learning framework
- Policy Gradient Reinforcement Learning Toolbox for MATLAB
- PIQLE - Platform Implementing Q-Learning and other RL algorithms
- BeliefBox - Bayesian reinforcement learning library and toolkit
- Deep Q-Learning with TensorFlow - A deep Q learning demonstration using Google Tensorflow
- Atari - Deep Q-networks and asynchronous agents in Torch
- AgentNet - A python library for deep reinforcement learning and custom recurrent networks using Theano+Lasagne.
- Reinforcement Learning Examples by RLCode - A Collection of minimal and clean reinforcement learning examples
- OpenAI Baselines - Well tested implementations (and results) of reinforcement learning algorithms from OpenAI
- PyTorch Deep RL - Popular deep RL algorithm implementations with PyTorch
- ChainerRL - Popular deep RL algorithm implementations with Chainer
- Black-DROPS - Modular and generic code for the model-based policy search Black-DROPS algorithm (IROS 2017 paper) and easy integration with the DART simulator
- [UCL] COMPM050/COMPGI13 Reinforcement Learning by David Silver
- [UC Berkeley] CS188 Artificial Intelligence by Pieter Abbeel
- [Udacity (Georgia Tech.)] CS7642 Reinforcement Learning
- [Stanford] CS229 Machine Learning - Lecture 16: Reinforcement Learning by Andrew Ng
- [UC Berkeley] Deep RL Bootcamp
- [UC Berkeley] CS294 Deep Reinforcement Learning by John Schulman and Pieter Abbeel
- [CMU] 10703: Deep Reinforcement Learning and Control, Spring 2017
- [MIT] 6.S094: Deep Learning for Self-Driving Cars
- [MIT] Deep Learning Basics -- Video slides
- [MIT] Introduction to Deep RL Video Slides
- [Siraj Raval]: Introduction to AI for Video Games (Reinforcement Learning Video Series)
- [Stanford] Stanford CS234: Reinforcement Learning | Winter 2019
- [Deepmind] Advanced Deep Learning & Reinforcement Learning
- Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction (1st Edition, 1998) [Book] [Code]
- Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction (2nd Edition, in progress, 2018) [Book] [Code]
- Csaba Szepesvari, Algorithms for Reinforcement Learning [Book]
- David Poole and Alan Mackworth, Artificial Intelligence: Foundations of Computational Agents [Book Chapter]
- Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic Programming [Book (Amazon)] [Summary]
- Mykel J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application [Book (Amazon)]
- Deep Reinforcement Learning in Action [Book(Manning)]
- Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey, JAIR, 1996. [Paper]
- S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning, Sadhana, 1994. [Paper]
- Matthew E. Taylor, Peter Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR, 2009. [Paper]
- Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement Learning in Robotics, A Survey, IJRR, 2013. [Paper]
- Michael L. Littman, "Reinforcement learning improves behaviour from evaluative feedback." Nature 521.7553 (2015): 445-451. [Paper]
- Marc P. Deisenroth, Gerhard Neumann, Jan Peter, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, 2014. [Book]
- Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, Anil Anthony Bharath, A Brief Survey of Deep Rei nforcement Learning, IEEE Signal Processing Magazine, 2017. [Paper]
- Ghavamzadeh, Mohammad, et al. "Bayesian reinforcement learning: A survey." Foundations and Trends® in Machine Learning 8.5-6 (2015): 359-483. [Paper]
- Bu, Lucian, Robert Babu, and Bart De Schutter. "A comprehensive survey of multiagent reinforcement learning." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38.2 (2008): 156-172. [Paper]
- Kiumarsi, Bahare, et al. "Optimal and autonomous control using reinforcement learning: A survey." IEEE transactions on neural networks and learning systems 29.6 (2017): 2042-2062. [Paper]
- Mao, Qian, Fei Hu, and Qi Hao. "Deep learning for intelligent wireless networks: A comprehensive survey." IEEE Communications Surveys & Tutorials 20.4 (2018): 2595-2621. [Paper]
- Grondman, Ivo, et al. "A survey of actor-critic reinforcement learning: Standard and natural policy gradients." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42.6 (2012): 1291-1307. [Paper]
Foundational Papers
- Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [Paper] (discusses issues in RL such as the "credit assignment problem")
- Ian H. Witten, An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [Paper] (earliest publication on temporal-difference (TD) learning rule)
Methods
- Dynamic Programming (DP):
- Christopher J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis]
- Monte Carlo:
- Temporal-Difference:
- Richard S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988. [Paper]
- Q-Learning (Off-policy TD algorithm):
- Chris Watkins, Learning from Delayed Rewards, Cambridge, 1989. [Thesis]
- Sarsa (On-policy TD algorithm):
- R-Learning (learning of relative values)
- Andrew Schwartz, A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993. [Paper-Google Scholar]
- Function Approximation methods (Least-Square Temporal Difference, Least-Square Policy Iteration)
- Policy Search / Policy Gradient
- Richard Sutton, David McAllester, Satinder Singh, Yishay Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]
- Jan Peters, Sethu Vijayakumar, Stefan Schaal, Natural Actor-Critic, ECML, 2005. [Paper]
- Jens Kober, Jan Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]
- Jan Peters, Katharina Mulling, Yasemin Altun, Relative Entropy Policy Search, AAAI, 2010. [Paper]
- Freek Stulp, Olivier Sigaud, Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012. [Paper]
- Nate Kohl, Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper]
- Marc Deisenroth, Carl Rasmussen, PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]
- Scott Kuindersma, Roderic Grupen, Andrew Barto, Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
- Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik, Dorian Goepp, Vassilis Vassiliades, Jean-Baptiste Mouret, Black-Box Data-efficient Policy Search for Robotics, IROS, 2017. [Paper]
- Hierarchical RL
- Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL)
- V. Mnih, et. al., Human-level Control through Deep Reinforcement Learning, Nature, 2015. [Paper]
- Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, Xiaoshi Wang, Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS, 2014. [Paper]
- Sergey Levine, Chelsea Finn, Trevor Darrel, Pieter Abbeel, End-to-End Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv]
- Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, ArXiv, 18 Nov 2015. [ArXiv]
- Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-Learning, ArXiv, 22 Sep 2015. [ArXiv]
- Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, ArXiv, 4 Feb 2016. [ArXiv]
Traditional Games
- Backgammon - "TD-Gammon" game play using TD(λ) (Tesauro, ACM 1995) [Paper]
- Chess - "KnightCap" program using TD(λ) (Baxter, arXiv 1999) [arXiv]
- Chess - Giraffe: Using deep reinforcement learning to play chess (Lai, arXiv 2015) [arXiv]
Computer Games
- Human-level Control through Deep Reinforcement Learning (Mnih, Nature 2015) [Paper] [Code] [Video]
- Flappy Bird Reinforcement Learning [Video]
- MarI/O - learning to play Mario with evolutionary reinforcement learning using artificial neural networks (Stanley, Evolutionary Computation 2002) [Paper] [Video]
- Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (Kohl, ICRA 2004) [Paper]
- Robot Motor SKill Coordination with EM-based Reinforcement Learning (Kormushev, IROS 2010) [Paper] [Video]
- Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (Hester, ICRA 2010) [Paper] [Video]
- Autonomous Skill Acquisition on a Mobile Manipulator (Konidaris, AAAI 2011) [Paper] [Video]
- PILCO: A Model-Based and Data-Efficient Approach to Policy Search (Deisenroth, ICML 2011) [Paper]
- Incremental Semantically Grounded Learning from Demonstration (Niekum, RSS 2013) [Paper]
- Efficient Reinforcement Learning for Robots using Informative Simulated Priors (Cutler, ICRA 2015) [Paper] [Video]
- Robots that can adapt like animals (Cully, Nature 2015) [Paper] [Video] [Code]
- Black-Box Data-efficient Policy Search for Robotics (Chatzilygeroudis, IROS 2017) [Paper] [Video] [Code]
- An Application of Reinforcement Learning to Aerobatic Helicopter Flight (Abbeel, NIPS 2006) [Paper] [Video]
- Autonomous helicopter control using Reinforcement Learning Policy Search Methods (Bagnell, ICRA 2001) [Paper]
- Scaling Average-reward Reinforcement Learning for Product Delivery (Proper, AAAI 2004) [Paper]
- Cross Channel Optimized Marketing by Reinforcement Learning (Abe, KDD 2004) [Paper]
- Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System (Singh, JAIR 2002) [Paper]
- Mance Harmon and Stephanie Harmon, Reinforcement Learning: A Tutorial
- C. Igel, M.A. Riedmiller, et al., Reinforcement Learning in a Nutshell, ESANN, 2007. [Paper]
- UNSW - Reinforcement Learning
- ROS Reinforcement Learning Tutorial
- POMDP for Dummies
- Scholarpedia articles on:
- Repository with useful MATLAB Software, presentations, and demo videos
- Bibliography on Reinforcement Learning
- UC Berkeley - CS 294: Deep Reinforcement Learning, Fall 2015 (John Schulman, Pieter Abbeel) [Class Website]
- Blog posts on Reinforcement Learning, Parts 1-4 by Travis DeWolf
- The Arcade Learning Environment - Atari 2600 games environment for developing AI agents
- Deep Reinforcement Learning: Pong from Pixels by Andrej Karpathy
- Demystifying Deep Reinforcement Learning
- Let’s make a DQN
- Simple Reinforcement Learning with Tensorflow, Parts 0-8 by Arthur Juliani
- Practical_RL - github-based course in reinforcement learning in the wild (lectures, coding labs, projects)
- RLenv.directory: Explore and find new reinforcement learning environments.
- Real-world demonstrations of Reinforcement Learning
- Deep Q-Learning Demo - A deep Q learning demonstration using ConvNetJS
- Deep Q-Learning with Tensor Flow - A deep Q learning demonstration using Google Tensorflow
- Reinforcement Learning Demo - A reinforcement learning demo using reinforcejs by Andrej Karpathy
- OpenAI gym - A toolkit for developing and comparing reinforcement learning algorithms
- OpenAI universe - A software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications
- DeepMind Lab - A customisable 3D platform for agent-based AI research
- Project Malmo - A platform for Artificial Intelligence experimentation and research built on top of Minecraft by Microsoft
- ViZDoom - Doom-based AI research platform for reinforcement learning from raw visual information
- Retro Learning Environment - An AI platform for reinforcement learning based on video game emulators. Currently supports SNES and Sega Genesis. Compatible with OpenAI gym.
- torch-twrl - A package that enables reinforcement learning in Torch by Twitter
- UETorch - A Torch plugin for Unreal Engine 4 by Facebook
- TorchCraft - Connecting Torch to StarCraft
- rllab - A framework for developing and evaluating reinforcement learning algorithms, fully compatible with OpenAI Gym
- TensorForce - Practical deep reinforcement learning on TensorFlow with Gitter support and OpenAI Gym/Universe/DeepMind Lab integration.
- tf-TRFL - A library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Learning agents.
- OpenAI lab - An experimentation system for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
- keras-rl - State-of-the art deep reinforcement learning algorithms in Keras designed for compatibility with OpenAI.
- BURLAP - Brown-UMBC Reinforcement Learning and Planning, a library written in Java
- MAgent - A Platform for Many-agent Reinforcement Learning.
- Ray RLlib - Ray RLlib is a reinforcement learning library that aims to provide both performance and composability.
- SLM Lab - A research framework for Deep Reinforcement Learning using Unity, OpenAI Gym, PyTorch, Tensorflow.
- Unity ML Agents - Create reinforcement learning environments using the Unity Editor
- Intel Coach - Coach is a python reinforcement learning research framework containing implementation of many state-of-the-art algorithms.