/DeepRL

Highly modularized implementation of popular deep RL algorithms by PyTorch

Primary LanguagePythonApache License 2.0Apache-2.0

DeepRL

Highly modularized implementation of popular deep RL algorithms by PyTorch. My principal here is to reuse as much components as I can through different algorithms, use as less tricks as I can and switch easily between classical control tasks like CartPole and Atari games with raw pixel inputs.

Implemented algorithms:

  • Deep Q-Learning (DQN)
  • Double DQN
  • Dueling DQN
  • Async Advantage Actor Critic (A3C)
  • Async One-Step Q-Learning
  • Async One-Step Sarsa
  • Async N-Step Q-Learning

Curves

Curves for CartPole are trivial so I didn't place it here.

DQN, Double DQN, Dueling DQN

Loading... Loading...

The network and parameters here are exactly same as the DeepMind Nature paper. Training curve is smoothed by a window of size 100. All the models are trained in a server with Xeon E5-2620 v3 and Titan X. For Breakout, test is triggered every 1000 episodes with 50 repetitions. In total, 16M frames cost about 4 days and 10 hours. For Pong, test is triggered every 10 episodes with no repetition. In total, 4M frames cost about 18 hours.

A3C

Loading...

The network I used here is a smaller network with only 42 * 42 input, alougth the network for DQN can also work here, it's quite slow.

Training took about 2 hours (16 processes) in a server with two Xeon E5-2620 v3. This is the test curve. Test is triggered in a separate deterministic test process every 50K frames.

Dependency

  • Open AI gym
  • PyTorch
  • PIL (pip install Pillow)
  • Python 2.7 (I didn't test with Python 3)

Usage

Detailed usage and all training details can be found in main.py

References