How to Learn Reinforcement Learning: A Step-by-step Guide

This repository provides the RL learning roadmap mentioned in the blog post How to Learn Reinforcement Learning: A Step-by-step Guide.

For complimentary MATLAB coding exercises with solutions, see RL Course MATLAB.

The RL Learning Roadmap

Highly recommend you work through the roadmap in order. After the first 4 chapters, you should have enough foundation to mix up the roadmap.

Make sure you fully understand the required concepts through learning materials
Implement the algorithm in your favorite framework. Learning happens when you implement and debug it yourself.
Test it out with some RL problems. My favorites are cart-pole, inverted pendulum, walking robot, pong.

Chapter	Algorithm	Required Concepts	Learning Materials
1	Dynamic Programming • Policy Evaluation • Policy Improvement • Value Iteration	• Markov Decision Process • Expected return • Discount factor • State, Observation • Action • Reward • State value function V(s) • State-action value function Q(s,a)	• MATLAB Tech Talk Part 1: What is RL? • MATLAB Tech Talk Part 2: Understanding the Environment and Rewards • RL Textbook - Chapter 3+4: Finite MDP + Dynamic Programming • WildML – Dynamic Programming exercises • David Silver’s Lecture 1+2
2	Temporal-Difference (TD) Learning • Q-Learning • SARSA	• TD Error • On-policy vs off-policy • Epsilon greedy	• RL Textbook - Chapter 6: Temporal Difference Learning • WildML – SARSA, Q-Learning exercises
3	Function Approximation (replace table with neural network) • Deep Q-Learning	RL • Why tables cannot scale • Relationship with supervised learning • Replay memory • Target network • Partially observable environment • Frame stacking for ATARI game environment • Typical DQN network • Double Q Learning Deep Learning • Supervised Learning • Feedforward network • Convolution neural network	RL • David Silver’s Lecture 6: Value function approximation • WildML – Q-Learning with Linear Function Approximation • DeepMind DQN paper • WildML – Deep Q-Learning for Atari Games • Arthur Juliani’s series Part 4 – Deep Q-Networks • Pytorch DQN Tutorial Deep Learning • Deep Learning Specialization Course 1+2
4	Policy gradient • REINFORCE (vanilla policy gradient) • Actor Critic	• Actor • Critic • Stochastic policy • Statistics: distribution (focus on normal/Gaussian distribution), sample from a distribution, entropy, probability density function • How to model discrete stochastic policy vs continuous stochastic policy • Importance sampling • KL divergence	• RL Textbook – Chapter 13: Policy Gradient Methods • WildML – Policy Gradient exercises • OpenAI Spinning Up – Vanilla Policy Gradient • Deep RL Berkeley – Policy Gradients + Actor-Critic Algorithms
5	Advanced Policy Gradient • Deep Deterministic Policy Gradient (DDPG) • Twin Delayed DDPG (TD3) • Proximal Policy Optimization (PPO) • Trust Region Policy Optimization (TRPO)	• Continuous action space • Deterministic policy • Deterministic policy gradient	• Deep RL Berkeley – Advanced Policy Gradients • Original papers • OpenAI Spinning Up – PPO, TRPO, DDPG and TD3
6	Partially Observable Environment • Modify existing algorithms to work with recurrent neural network (RNN)	• Recurrent neural network (RNN) • Backpropagation through time • Observation stacking • How to sample data out of replay memory for RNN update	• Arthur Juliani’s series Part 6 – Partial Observability and DRQN • Deep Recurrent Q-Learning for Partially Observable MDPs • Memory-based control with recurrent neural networks
7	Model-based • Modify existing algorithms to utilize a model of the environment to simulate and plan	• Motivation: environment can be on actual hardware (high cost) • Model: an approximation of the environment • Environment step vs model step • Model-based planning • Model-based learning • Parallelization for on-policy vs off-policy algorithms • Gradient parallelization • Experience parallelization	• RL Textbook – Chapter 8: Planning and Learning with Tabular Methods (8.1-8.4) • Deep RL Berkeley – Model-based Planning • Deep RL Berkeley – Model-based Reinforcement Learning
8	Parallelization • A2C • A3C • IMPALA	• Parallelization for on-policy vs off-policy algorithms • Gradient parallelization • Experience parallelization	• Deep RL Berkeley – Distributed RL
9	Exploration	• Explore through sampling • Intrinsic motivation • Imitation learning	• Deep RL Berkeley – Exploration

References

• Reinforcement Learning Toolbox, The MathWorks
• Reinforcement Learning: An Introduction (textbook), Sutton and Barto
• Deep Reinforcement Learning (course), UC Berkeley
• OpenAI Spinning Up(textbook/blog)
• WildML Learning Reinforcement Learning (python course with exercises/solutions), Denny Britz
• MATLAB RL Tech Talks (videos), The MathWorks
• David Silver’s RL course
• Simple Reinforcement Learning (blog), Arthur Juliani
• Deep Learning Specialization Coursera (course), Andrew Ng (you can audit for free, highly recommend course 1 + 2 to get Deep Learning foundations)

anhOfTheStars/RLStudyGuide

How to Learn Reinforcement Learning: A Step-by-step Guide

The RL Learning Roadmap

References