/spinningup

An educational resource to help anyone learn deep reinforcement learning.

Primary LanguagePythonMIT LicenseMIT

Reinforcement Learning from Scratch

Spinning Up is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL). I really appreciate Spinning up because I learned a lot from it.

Why I Built This

Inspired by the article, Spinning Up as a Deep RL Researcher, especially the following paragraph, I decided to write my own implementations.

Write your own implementations. You should implement as many of the core deep RL algorithms from scratch as you can, with the aim of writing the shortest correct implementation of each. This is by far the best way to develop an understanding of how they work, as well as intuitions for their specific performance characteristics.

I will first re-implement the existing algorithms in openai/spinningup with my favorite code style. Then I will implement some algorithms that are not there.

My design principle:

  • Writting the shortest correct implementation of core deep RL algorithms.
  • Writting more readable code.

Algorithms

  • VPG
  • TRPO
  • PPO
  • DDPG
  • TD3
  • SAC
  • DQN
  • C51
  • QR-DQN

Installation

Creating the python environment

conda create -n spinningup python=3.6
source activate spinningup

Installing Spinning Up

git clone https://github.com/XFFXFF/spinningup.git
cd spinningup
pip install -e .

Running Tests

Training a model

cd spinningup
python -m spinup.algos.ppo --env Pendulum-v0 --seed 0

Plotting the performance(average epoch return)

cd spinningup
python -m spinup.plot data/ppo/Pendulum-v0/seed0

See the page on plotting results for documentation of the plotter.

References

VPG

Vanilla Policy Gradient, OpenAI/Spiningup.
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton et al. 2000.
High Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al. 2016(b)

TRPO

Trust Region Policy Optimization, Schulman et al, 2015.
Advanced policy gradients (natural gradient, importance sampling), Joshua Achiam, 2017.
Trust Region Policy Optimization, OpenAI/Spiningup.

PPO

Proximal Policy Optimization, OpenAI/Spiningup.
Proximal Policy Optimization Algorithms, Schulman et al. 2017.

DDPG

Deep Deterministic Policy Gradient, OpenAI/Spinningup.
Deterministic Policy Gradient Algorithms, Silver et al, 2014.
Continuous Control With Deep Reinforcement Learning, Lillicrap et al, 2016.

TD3

Twin Delayed DDPG, OpenAI/Spinningup.
Addressing Function Approximation Error in Actor-Critic Methods, Fujimoto et al, 2018.

SAC

Soft Actor-Critic, OpenAI/Spinningup.
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al, 2018.

DQN

Human-level control through deep reinforcement learning, Mnih et al, 2013.
berkeleydeeprlcourse/homework

C51

A Distributional Perspective on Reinforcement Learning, Bellemare et al, 2017.
Marc G. Bellemare, Pablo Samuel Castro, Carles Gelada, Saurabh Kumar, Subhodeep Moitra. Dopamine, https://github.com/google/dopamine, 2018.

QR-DQN

Distributional Reinforcement Learning with Quantile Regression, Dabney et al, 2017.