/RLStudyGuide

How to Learn Reinforcement Learning: A Step-by-step Guide

How to Learn Reinforcement Learning: A Step-by-step Guide

This repository provides the RL learning roadmap mentioned in the blog post How to Learn Reinforcement Learning: A Step-by-step Guide.

For complimentary MATLAB coding exercises with solutions, see RL Course MATLAB.

The RL Learning Roadmap

Highly recommend you work through the roadmap in order. After the first 4 chapters, you should have enough foundation to mix up the roadmap.

  • Make sure you fully understand the required concepts through learning materials
  • Implement the algorithm in your favorite framework. Learning happens when you implement and debug it yourself.
  • Test it out with some RL problems. My favorites are cart-pole, inverted pendulum, walking robot, pong.
Chapter Algorithm Required Concepts Learning Materials
1 Dynamic Programming
• Policy Evaluation
• Policy Improvement
• Value Iteration
• Markov Decision Process
• Expected return
• Discount factor
• State, Observation
• Action
• Reward
• State value function V(s)
• State-action value function Q(s,a)
MATLAB Tech Talk Part 1: What is RL?
MATLAB Tech Talk Part 2: Understanding the Environment and Rewards
RL Textbook - Chapter 3+4: Finite MDP + Dynamic Programming
WildML – Dynamic Programming exercises
David Silver’s Lecture 1+2
2 Temporal-Difference (TD) Learning
• Q-Learning
• SARSA
• TD Error
• On-policy vs off-policy
• Epsilon greedy
RL Textbook - Chapter 6: Temporal Difference Learning
WildML – SARSA, Q-Learning exercises
3 Function Approximation (replace table with neural network)
• Deep Q-Learning
RL
• Why tables cannot scale
• Relationship with supervised learning
• Replay memory
• Target network
• Partially observable environment
• Frame stacking for ATARI game environment
• Typical DQN network
• Double Q Learning

Deep Learning
• Supervised Learning
• Feedforward network
• Convolution neural network
RL
David Silver’s Lecture 6: Value function approximation
WildML – Q-Learning with Linear Function Approximation
DeepMind DQN paper
WildML – Deep Q-Learning for Atari Games
Arthur Juliani’s series Part 4 – Deep Q-Networks
Pytorch DQN Tutorial

Deep Learning
Deep Learning Specialization Course 1+2
4 Policy gradient
• REINFORCE (vanilla policy gradient)
• Actor Critic
• Actor
• Critic
• Stochastic policy
• Statistics: distribution (focus on normal/Gaussian distribution), sample from a distribution, entropy, probability density function
• How to model discrete stochastic policy vs continuous stochastic policy
• Importance sampling
• KL divergence
RL Textbook – Chapter 13: Policy Gradient Methods
WildML – Policy Gradient exercises
OpenAI Spinning Up – Vanilla Policy Gradient
Deep RL Berkeley – Policy Gradients + Actor-Critic Algorithms
5 Advanced Policy Gradient
• Deep Deterministic Policy Gradient (DDPG)
• Twin Delayed DDPG (TD3)
• Proximal Policy Optimization (PPO)
• Trust Region Policy Optimization (TRPO)
• Continuous action space
• Deterministic policy
• Deterministic policy gradient
Deep RL Berkeley – Advanced Policy Gradients
• Original papers
OpenAI Spinning Up – PPO, TRPO, DDPG and TD3
6 Partially Observable Environment
• Modify existing algorithms to work with recurrent neural network (RNN)
• Recurrent neural network (RNN)
• Backpropagation through time
• Observation stacking
• How to sample data out of replay memory for RNN update
Arthur Juliani’s series Part 6 – Partial Observability and DRQN
Deep Recurrent Q-Learning for Partially Observable MDPs
Memory-based control with recurrent neural networks
7 Model-based
• Modify existing algorithms to utilize a model of the environment to simulate and plan
• Motivation: environment can be on actual hardware (high cost)
• Model: an approximation of the environment
• Environment step vs model step
• Model-based planning
• Model-based learning
• Parallelization for on-policy vs off-policy algorithms
• Gradient parallelization
• Experience parallelization
RL Textbook – Chapter 8: Planning and Learning with Tabular Methods (8.1-8.4)
Deep RL Berkeley – Model-based Planning
Deep RL Berkeley – Model-based Reinforcement Learning
8 Parallelization
• A2C
• A3C
• IMPALA
• Parallelization for on-policy vs off-policy algorithms
• Gradient parallelization
• Experience parallelization
Deep RL Berkeley – Distributed RL
9 Exploration • Explore through sampling
• Intrinsic motivation
• Imitation learning
Deep RL Berkeley – Exploration

References

Reinforcement Learning Toolbox, The MathWorks
Reinforcement Learning: An Introduction (textbook), Sutton and Barto
Deep Reinforcement Learning (course), UC Berkeley
OpenAI Spinning Up(textbook/blog)
WildML Learning Reinforcement Learning (python course with exercises/solutions), Denny Britz
MATLAB RL Tech Talks (videos), The MathWorks
David Silver’s RL course
Simple Reinforcement Learning (blog), Arthur Juliani
Deep Learning Specialization Coursera (course), Andrew Ng (you can audit for free, highly recommend course 1 + 2 to get Deep Learning foundations)