Highly modularized implementation of popular deep RL algorithms by PyTorch. My principal here is to reuse as much components as possible through different algorithms and switch easily between classical control tasks like CartPole and Atari games with raw pixel inputs.
Implemented algorithms:
- (Double/Dueling) Deep Q-Learning (DQN)
- Categorical DQN (C51, Distributional DQN with KL Distance)
- Quantile Regression DQN (Distributional DQN with Wasserstein Distance)
- Synchronous Advantage Actor Critic (A2C)
- Synchronous N-Step Q-Learning
- Deep Deterministic Policy Gradient (DDPG)
- (Continuous/Discrete) Synchronous Proximal Policy Optimization (PPO)
- Action Conditional Video Prediction
Asynchronous algorithms below are removed in this repo but can be found in the previous release
- Async Advantage Actor Critic (A3C)
- Async One-Step Q-Learning
- Async One-Step Sarsa
- Async N-Step Q-Learning
- Continuous A3C
- Distributed Deep Deterministic Policy Gradient (Distributed DDPG, aka D3PG)
- Parallelized Proximal Policy Optimization (P3O, similar to DPPO)
Curves for CartPole are trivial so I didn't place it here. And there isn't any fixed random seed. The curves are generated in the same manner as OpenAI baselines (one run and smoothed by recent 100 episodes)
Left: One-step prediction Right: Ground truth
Prediction is sampled after 110K iterations, and I only implemented one-step training
Tested in macOS 10.12 and CentO/S 6.8
- OpenAI gym
- PyTorch v0.3.0
- Python 2.7 / 3.6
- Roboschool (Optional)
- DeepMind Control Suite & DMControl2Gym (Optional)
- TensorboardX (Optional)
main.py
contains examples for all the implemented algorithms
- Human Level Control through Deep Reinforcement Learning
- Asynchronous Methods for Deep Reinforcement Learning
- Deep Reinforcement Learning with Double Q-learning
- Dueling Network Architectures for Deep Reinforcement Learning
- Playing Atari with Deep Reinforcement Learning
- HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
- Deterministic Policy Gradient Algorithms
- Continuous control with deep reinforcement learning
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Hybrid Reward Architecture for Reinforcement Learning
- Trust Region Policy Optimization
- Proximal Policy Optimization Algorithms
- Emergence of Locomotion Behaviours in Rich Environments
- Action-Conditional Video Prediction using Deep Networks in Atari Games
- A Distributional Perspective on Reinforcement Learning
- Distributional Reinforcement Learning with Quantile Regression
- Some hyper-parameters are from DeepMind Control Suite, OpenAI Baselines and Ilya Kostrikov