DeepRL

Highly modularized implementation of popular deep RL algorithms by PyTorch. My principal here is to reuse as much components as possible through different algorithms and switch easily between classical control tasks like CartPole and Atari games with raw pixel inputs.

Implemented algorithms:

(Double/Dueling) Deep Q-Learning (DQN)
Categorical DQN (C51, Distributional DQN with KL Distance)
Quantile Regression DQN (Distributional DQN with Wasserstein Distance)
Synchronous Advantage Actor Critic (A2C)
Synchronous N-Step Q-Learning
Deep Deterministic Policy Gradient (DDPG)
(Continuous/Discrete) Synchronous Proximal Policy Optimization (PPO)
Action Conditional Video Prediction

Asynchronous algorithms below are removed in this repo but can be found in the previous release

Async Advantage Actor Critic (A3C)
Async One-Step Q-Learning
Async One-Step Sarsa
Async N-Step Q-Learning
Continuous A3C
Distributed Deep Deterministic Policy Gradient (Distributed DDPG, aka D3PG)
Parallelized Proximal Policy Optimization (P3O, similar to DPPO)

Curves

Curves for CartPole are trivial so I didn't place it here. And there isn't any fixed random seed. The curves are generated in the same manner as OpenAI baselines (one run and smoothed by recent 100 episodes)