- 1.1. Basic Idea of Reinforcement Learning
- 1.2. Key Elements of Reinforcement Learning
- 1.3. Reinforcement Learning Algorithm
- 1.4. RL Agent in the Grid World
- 1.5. How RL differs from other ML paradigms?
- 1.6. Markov Decision Processes
- 1.7. Action Space, Policy, Episode and Horizon
- 1.8. Return, Discount Factor and Math Essentials
- 1.9. Value Function and Q Function
- 1.10. Model-Based and Model-Free Learning
- 1.11. Different Types of Environments
- 1.12. Applications of Reinforcement Learning
- 1.13. Reinforcement Learning Glossary
- 2.1. Setting Up our Machine
- 2.2. Creating our First Gym Environment
- 2.3. Generating an episode
- 2.4. Classic Control Environments
- 2.5. Cart Pole Balancing with Random Policy
- 2.6. Atari Game Environments
- 2.7. Agent Playing the Tennis Game
- 2.8. Recording the Game
- 2.9. Other environments
- 2.10. Environment Synopsis
- 3.1. The Bellman Equation
- 3.2. Bellman Optimality Equation
- 3.3. Relation Between Value and Q Function
- 3.4. Dynamic Programming
- 3.5. Value Iteration
- 3.6. Solving the Frozen Lake Problem with Value Iteration
- 3.7. Policy iteration
- 3.8. Solving the Frozen Lake Problem with Policy Iteration
- 3.9. Is DP Applicable to all Environments?
- 4.1. Understanding the Monte Carlo Method
- 4.2. Prediction and Control Tasks
- 4.3. Monte Carlo Prediction
- 4.4. Understanding the BlackJack Game
- 4.5. Every-visit MC Prediction with Blackjack Game
- 4.6. First-visit MC Prediction with Blackjack Game
- 4.7. Incremental Mean Updates
- 4.8. MC Prediction (Q Function)
- 4.9. Monte Carlo Control
- 4.10. On-Policy Monte Carlo Control
- 4.11. Monte Carlo Exploring Starts
- 4.12. Monte Carlo with Epsilon-Greedy Policy
- 4.13. Implementing On-Policy MC Control
- 4.14. Off-Policy Monte Carlo Control
- 4.15. Is MC Method Applicable to all Tasks?
- 5.1. TD Learning
- 5.2. TD Prediction
- 5.3. Predicting the Value of States in a Frozen Lake Environment
- 5.4. TD Control
- 5.5. On-Policy TD Control - SARSA
- 5.6. Computing Optimal Policy using SARSA
- 5.7. Off-Policy TD Control - Q Learning
- 5.8. Computing the Optimal Policy using Q Learning
- 5.9. The Difference Between Q Learning and SARSA
- 5.10. Comparing DP, MC, and TD Methods
- 6.1. The MAB Problem
- 6.2. Creating Bandit in the Gym
- 6.3. Epsilon-Greedy
- 6.4. Implementing Epsilon-Greedy
- 6.5. Softmax Exploration
- 6.6. Implementing Softmax Exploration
- 6.7. Upper Confidence Bound
- 6.8. Implementing UCB
- 6.9. Thompson Sampling
- 6.10. Implementing Thompson Sampling
- 6.11. Applications of MAB
- 6.12. Finding the Best Advertisement Banner using Bandits
- 6.13. Contextual Bandits
- 7.1. What is Deep Q Network?
- 7.2. Understanding DQN
- 7.3. Playing Atari Games using DQN
- 7.4. Double DQN
- 7.5. DQN with Prioritized Experience Replay
- 7.6. Dueling DQN
- 7.7. Deep Recurrent Q Network
- 8.1. Why Policy Based Methods?
- 8.2. Policy Gradient Intuition
- 8.3. Understanding the Policy Gradient
- 8.4. Deriving Policy Gradien
- 8.5. Variance Reduction Methods
- 8.6. Policy Gradient with Reward-to-go
- 8.7. Cart Pole Balancing with Policy Gradient
- 8.8. Policy Gradient with Baseline
- 9.1. Overview of Actor Critic Method
- 9.2. Understanding the Actor Critic Method
- 9.3. Advantage Actor Critic
- 9.4. Asynchronous Advantage Actor Critic
- 9.5. Mountain Car Climbing using A3C
- 9.6. A2C Revisited
- 10.1. Deep Deterministic Policy Gradient
- 10.2. Swinging Up the Pendulum using DDPG
- 10.3. Twin Delayed DDPG
- 10.4. Soft Actor Critic
- 11.1 Trust Region Policy Optimization
- 11.2. Math Essentials
- 11.3. Designing the TRPO Objective Function
- 11.4. Solving the TRPO Objective Function
- 11.5. Algorithm - TRPO
- 11.6. Proximal Policy Optimization
- 11.7. PPO with Clipped Objective
- 11.9. Implementing PPO-Clipped Method
- 11.10. PPO with Penalized Objective
- 11.11. Actor Critic using Kronecker Factored Trust Region
- 11.12. Math Essentials
- 11.13. Kronecker-Factored Approximate Curvature (K-FAC)
- 11.14. K-FAC in Actor Critic
- 12.1. Why Distributional Reinforcement Learning?
- 12.2. Categorical DQN
- 12.3. Playing Atari games using Categorical DQN
- 12.4. Quantile Regression DQN
- 12.5. Math Essentials
- 12.6. Understanding QR-DQN
- 12.7. Distributed Distributional DDPG
- 13.1. Supervised Imitation Learning
- 13.2. DAgger
- 13.3. Deep Q learning from Demonstrations
- 13.4. Inverse Reinforcement Learning
- 13.5. Maximum Entropy IRL
- 13.6. Generative Adversarial Imitation Learning
- 14.1. Creating our First Agent with Baseline
- 14.2. Multiprocessing with Vectorized Environments
- 14.3. Integrating the Custom Environments
- 14.4. Playing Atari Games with DQN
- 14.5. Implememt DQN variants
- 14.6. Lunar Lander using A2C
- 14.7. Creating a custom network
- 14.8. Swinging up a Pendulum using DDPG
- 14.9. Training an Agent to Walk using TRPO
- 14.10. Training Cheetah Bot to Run using PPO
- 15.1. Meta Reinforcement Learning
- 15.2. Model Agnostic Meta Learning
- 15.3. Understanding MAML
- 15.4. MAML in the Supervised Learning Setting
- 15.5. Algorithm - MAML in Supervised Learning
- 15.6. MAML in the Reinforcement Learning Setting
- 15.7. Algorithm - MAML in Reinforcement Learning
- 15.8. Hierarchical Reinforcement Learning
- 15.9. MAXQ Value Function Decomposition
- 15.10. Imagination Augmented Agents