/RL-Adventure-2

PyTorch0.4 implementation of: actor critic / proximal policy optimization / acer / ddpg / twin dueling ddpg / soft actor critic / generative adversarial imitation learning / hindsight experience replay

Primary LanguageJupyter Notebook

RL-Adventure-2: Policy Gradients

PyTorch tutorial of: actor critic / proximal policy optimization / acer / ddpg / twin dueling ddpg / soft actor critic / generative adversarial imitation learning / hindsight experience replay

The deep reinforcement learning community has made several improvements to the policy gradient algorithms. This tutorial presents latest extensions in the following order:

  1. Advantage Actor Critic (A2C)
  1. High-Dimensional Continuous Control Using Generalized Advantage Estimation
  1. Proximal Policy Optimization Algorithms
  1. Sample Efficient Actor-Critic with Experience Replay
  1. Continuous control with deep reinforcement learning
  1. Addressing Function Approximation Error in Actor-Critic Methods
  1. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
  1. Generative Adversarial Imitation Learning
  1. Hindsight Experience Replay

If you get stuck…

  • Remember you are not stuck unless you have spent more than a week on a single algorithm. It is perfectly normal if you do not have all the required knowledge of mathematics and CS.
  • Carefully go through the paper. Try to see what is the problem the authors are solving. Understand a high-level idea of the approach, then read the code (skipping the proofs), and after go over the mathematical details and proofs.

RL Algorithms

Deep Q Learning tutorial: DQN Adventure: from Zero to State of the Art N|Solid Awesome RL libs: rlkit @vitchyr, pytorch-a2c-ppo-acktr @ikostrikov, ACER @Kaixhin

Best RL courses

  • Berkeley deep RL link
  • Deep RL Bootcamp link
  • David Silver's course link
  • Practical RL link