Near32/PYTORCH_RL

Some Reinforcement Learning algorithms in PyTorch.

Python

Deep Reinforcement Learning with PyTorch

Deep Q-Network (DQN)

This implementation of the Deep Q-Network ("Human-level control through deep reinforcement learning") can be augmented with the following features :

"Prioritized Experience Replay"
"Dueling Deep Q-Network"
"Double Deep Q-Network"
a multi-threaded "Distributed Architecture" with a unique replay memory though.
"Hindsight Experience Replay"

Experiment : CartPole-v1 :

Adam
learning rate : 1e-4
minibatch size : 128
replay memory capacity : 25e3
prioritized experience replay exponent $\alpha$ : 0.5
number of thread/worker : 1
double DQN : [x]
hindsight experience replay : [ ]

Deep Deterministic Policy Gradient (DDPG)

This implementation of the Deep Deterministic Policy Gradient ("Continuous Control with Deep Reinforcement Learning") can be augmented with the following features :

"Prioritized Experience Replay"
"Dueling Deep Q-Network"
a multi-threaded architecture ("A2C"/"A3C").
"Hindsight Experience Replay"

Experiment : Pendulum-v0 :

Adam
learning rate : 1e-4
minibatch size : 128
soft update $\tau$ : 1e-3
replay memory capacity : 1e6
prioritized experience replay exponent $\alpha$ : 0.0 (no priority)
number of thread/worker : 1
hindsight experience replay : [ ]

Proximal Policy Optimization (PPO)

This implementation of the "Proximal Policy Optimization Algorithm" can be augmented with the following features :

"Prioritized Experience Replay"
"Dueling Deep Q-Network"
a multi-threaded architecture ("A2C"/"A3C").
"Hindsight Experience Replay"

Experiment : Pendulum-v0 :

Adam
learning rate : 1e-6
minibatch size : 64
soft update $\tau$ : 1e-3
replay memory capacity : 25e3
prioritized experience replay exponent $\alpha$ : 0.0 (no priority)
number of thread/worker : 1
hindsight experience replay : [ ]