/DeepRL

Highly modularized implementation of popular deep RL algorithms in PyTorch

Primary LanguagePythonApache License 2.0Apache-2.0

DeepRL

Modularized implementation of popular deep RL algorithms by PyTorch. Easy switch between classical control tasks (e.g., CartPole) and Atari games with raw pixel inputs.

Implemented algorithms:

  • (Double/Dueling) Deep Q-Learning (DQN)
  • Categorical DQN (C51, Distributional DQN with KL Distance)
  • Quantile Regression DQN
  • (Continuous/Discrete) Synchronous Advantage Actor Critic (A2C)
  • Synchronous N-Step Q-Learning
  • Deep Deterministic Policy Gradient (DDPG, pixel & low-dim-state)
  • (Continuous/Discrete) Synchronous Proximal Policy Optimization (PPO, pixel & low-dim-state)
  • The Option-Critic Architecture (OC)
  • Action Conditional Video Prediction

Asynchronous algorithms (e.g., A3C) are removed in the current version but can be found in v0.1.

Dependency

  • MacOS 10.12 or Ubuntu 16.04
  • PyTorch v0.4.0
  • Python 3.6, 3.5
  • Core dependencies: pip install -e .
  • Optional: Roboschool, PyBullet

Remarks

  • There is a super fast DQN implementation with an async actor for data generation and an async replay buffer to transfer data to GPU. Enable this implementation by setting config.async_actor = True and using AsyncReplay. However, with atari games this fast implementation may not work in macOS. Use Ubuntu or Docker instead.
  • Python 2 is not officially supported after v0.3. However, I do expect most of the code will still work well in Python 2.
  • Although there is a setup.py, which means you can install the repo as a library, this repo is never designed to be a high-level library like Keras. Use it as your codebase instead.

Usage

examples.py contains examples for all the implemented algorithms

Dockerfile contains an example environment (w/ pybullet, w/ roboschool, w/o GPU)

Please use this bibtex if you want to cite this repo

@misc{deeprl,
  author = {Shangtong, Zhang},
  title = {Modularized Implementation of Deep RL Algorithms in PyTorch},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/ShangtongZhang/DeepRL}},
}

Curves

BreakoutNoFrameskip-v4

Loading...

  • This is my synchronous option-critic implementation, not the original one.
  • The curves are not directly comparable, as many hyper-parameters are different.

RoboschoolHopper-v1

Loading...

  • The DDPG curve is the evaluation performance, rather than online.

PongNoFrameskip-v4

Loading...

  • Left: One-step prediction Right: Ground truth
  • Prediction images are sampled after 110K iterations, and I only implemented one-step training for action-conditional-video-prediction.

References