1. Fundamentals of Reinforcement Learning

  • 1.1. Basic Idea of Reinforcement Learning
  • 1.2. Key Elements of Reinforcement Learning
  • 1.3. Reinforcement Learning Algorithm
  • 1.4. RL Agent in the Grid World
  • 1.5. How RL differs from other ML paradigms?
  • 1.6. Markov Decision Processes
  • 1.7. Action Space, Policy, Episode and Horizon
  • 1.8. Return, Discount Factor and Math Essentials
  • 1.9. Value Function and Q Function
  • 1.10. Model-Based and Model-Free Learning
  • 1.11. Different Types of Environments
  • 1.12. Applications of Reinforcement Learning
  • 1.13. Reinforcement Learning Glossary

2. A Guide to the Gym Toolkit

  • 2.1. Setting Up our Machine
  • 2.2. Creating our First Gym Environment
  • 2.3. Generating an episode
  • 2.4. Classic Control Environments
  • 2.5. Cart Pole Balancing with Random Policy
  • 2.6. Atari Game Environments
  • 2.7. Agent Playing the Tennis Game
  • 2.8. Recording the Game
  • 2.9. Other environments
  • 2.10. Environment Synopsis

3. Bellman Equation and Dynamic Programming

  • 3.1. The Bellman Equation
  • 3.2. Bellman Optimality Equation
  • 3.3. Relation Between Value and Q Function
  • 3.4. Dynamic Programming
  • 3.5. Value Iteration
  • 3.6. Solving the Frozen Lake Problem with Value Iteration
  • 3.7. Policy iteration
  • 3.8. Solving the Frozen Lake Problem with Policy Iteration
  • 3.9. Is DP Applicable to all Environments?

4. Monte Carlo Methods

  • 4.1. Understanding the Monte Carlo Method
  • 4.2. Prediction and Control Tasks
  • 4.3. Monte Carlo Prediction
  • 4.4. Understanding the BlackJack Game
  • 4.5. Every-visit MC Prediction with Blackjack Game
  • 4.6. First-visit MC Prediction with Blackjack Game
  • 4.7. Incremental Mean Updates
  • 4.8. MC Prediction (Q Function)
  • 4.9. Monte Carlo Control
  • 4.10. On-Policy Monte Carlo Control
  • 4.11. Monte Carlo Exploring Starts
  • 4.12. Monte Carlo with Epsilon-Greedy Policy
  • 4.13. Implementing On-Policy MC Control
  • 4.14. Off-Policy Monte Carlo Control
  • 4.15. Is MC Method Applicable to all Tasks?

5. Understanding Temporal Difference Learning

  • 5.1. TD Learning
  • 5.2. TD Prediction
  • 5.3. Predicting the Value of States in a Frozen Lake Environment
  • 5.4. TD Control
  • 5.5. On-Policy TD Control - SARSA
  • 5.6. Computing Optimal Policy using SARSA
  • 5.7. Off-Policy TD Control - Q Learning
  • 5.8. Computing the Optimal Policy using Q Learning
  • 5.9. The Difference Between Q Learning and SARSA
  • 5.10. Comparing DP, MC, and TD Methods

6. Case Study: The MAB Problem

  • 6.1. The MAB Problem
  • 6.2. Creating Bandit in the Gym
  • 6.3. Epsilon-Greedy
  • 6.4. Implementing Epsilon-Greedy
  • 6.5. Softmax Exploration
  • 6.6. Implementing Softmax Exploration
  • 6.7. Upper Confidence Bound
  • 6.8. Implementing UCB
  • 6.9. Thompson Sampling
  • 6.10. Implementing Thompson Sampling
  • 6.11. Applications of MAB
  • 6.12. Finding the Best Advertisement Banner using Bandits
  • 6.13. Contextual Bandits

7. Deep Q Network and its Variants

  • 7.1. What is Deep Q Network?
  • 7.2. Understanding DQN
  • 7.3. Playing Atari Games using DQN
  • 7.4. Double DQN
  • 7.5. DQN with Prioritized Experience Replay
  • 7.6. Dueling DQN
  • 7.7. Deep Recurrent Q Network

8. Policy Gradient Method

  • 8.1. Why Policy Based Methods?
  • 8.2. Policy Gradient Intuition
  • 8.3. Understanding the Policy Gradient
  • 8.4. Deriving Policy Gradien
  • 8.5. Variance Reduction Methods
  • 8.6. Policy Gradient with Reward-to-go
  • 8.7. Cart Pole Balancing with Policy Gradient
  • 8.8. Policy Gradient with Baseline

9. Actor Critic Methods - A2C and A3C

  • 9.1. Overview of Actor Critic Method
  • 9.2. Understanding the Actor Critic Method
  • 9.3. Advantage Actor Critic
  • 9.4. Asynchronous Advantage Actor Critic
  • 9.5. Mountain Car Climbing using A3C
  • 9.6. A2C Revisited

10. Learning DDPG, TD3 and SAC

  • 10.1. Deep Deterministic Policy Gradient
  • 10.2. Swinging Up the Pendulum using DDPG
  • 10.3. Twin Delayed DDPG
  • 10.4. Soft Actor Critic

11. TRPO, PPO and ACKTR Methods

  • 11.1 Trust Region Policy Optimization
  • 11.2. Math Essentials
  • 11.3. Designing the TRPO Objective Function
  • 11.4. Solving the TRPO Objective Function
  • 11.5. Algorithm - TRPO
  • 11.6. Proximal Policy Optimization
  • 11.7. PPO with Clipped Objective
  • 11.9. Implementing PPO-Clipped Method
  • 11.10. PPO with Penalized Objective
  • 11.11. Actor Critic using Kronecker Factored Trust Region
  • 11.12. Math Essentials
  • 11.13. Kronecker-Factored Approximate Curvature (K-FAC)
  • 11.14. K-FAC in Actor Critic

12. Distributional Reinforcement Learning

  • 12.1. Why Distributional Reinforcement Learning?
  • 12.2. Categorical DQN
  • 12.3. Playing Atari games using Categorical DQN
  • 12.4. Quantile Regression DQN
  • 12.5. Math Essentials
  • 12.6. Understanding QR-DQN
  • 12.7. Distributed Distributional DDPG

13. Imitation Learning and Inverse RL

  • 13.1. Supervised Imitation Learning
  • 13.2. DAgger
  • 13.3. Deep Q learning from Demonstrations
  • 13.4. Inverse Reinforcement Learning
  • 13.5. Maximum Entropy IRL
  • 13.6. Generative Adversarial Imitation Learning

14. Deep Reinforcement Learning with Stable Baselines

  • 14.1. Creating our First Agent with Baseline
  • 14.2. Multiprocessing with Vectorized Environments
  • 14.3. Integrating the Custom Environments
  • 14.4. Playing Atari Games with DQN
  • 14.5. Implememt DQN variants
  • 14.6. Lunar Lander using A2C
  • 14.7. Creating a custom network
  • 14.8. Swinging up a Pendulum using DDPG
  • 14.9. Training an Agent to Walk using TRPO
  • 14.10. Training Cheetah Bot to Run using PPO

15. Reinforcement Learning Frontiers

  • 15.1. Meta Reinforcement Learning
  • 15.2. Model Agnostic Meta Learning
  • 15.3. Understanding MAML
  • 15.4. MAML in the Supervised Learning Setting
  • 15.5. Algorithm - MAML in Supervised Learning
  • 15.6. MAML in the Reinforcement Learning Setting
  • 15.7. Algorithm - MAML in Reinforcement Learning
  • 15.8. Hierarchical Reinforcement Learning
  • 15.9. MAXQ Value Function Decomposition
  • 15.10. Imagination Augmented Agents