This implementation of the Deep Q-Network ("Human-level control through deep reinforcement learning") can be augmented with the following features :
- "Prioritized Experience Replay"
- "Dueling Deep Q-Network"
- "Double Deep Q-Network"
- a multi-threaded "Distributed Architecture" with a unique replay memory though.
- "Hindsight Experience Replay"
Experiment : CartPole-v1 :
- Adam
- learning rate : 1e-4
- minibatch size : 128
- replay memory capacity : 25e3
- prioritized experience replay exponent
$\alpha$ : 0.5 - number of thread/worker : 1
- double DQN : [x]
- hindsight experience replay : [ ]
This implementation of the Deep Deterministic Policy Gradient ("Continuous Control with Deep Reinforcement Learning") can be augmented with the following features :
- "Prioritized Experience Replay"
- "Dueling Deep Q-Network"
- a multi-threaded architecture ("A2C"/"A3C").
- "Hindsight Experience Replay"
Experiment : Pendulum-v0 :
- Adam
- learning rate : 1e-4
- minibatch size : 128
- soft update
$\tau$ : 1e-3 - replay memory capacity : 1e6
- prioritized experience replay exponent
$\alpha$ : 0.0 (no priority) - number of thread/worker : 1
- hindsight experience replay : [ ]
This implementation of the "Proximal Policy Optimization Algorithm" can be augmented with the following features :
- "Prioritized Experience Replay"
- "Dueling Deep Q-Network"
- a multi-threaded architecture ("A2C"/"A3C").
- "Hindsight Experience Replay"
Experiment : Pendulum-v0 :
- Adam
- learning rate : 1e-6
- minibatch size : 64
- soft update
$\tau$ : 1e-3 - replay memory capacity : 25e3
- prioritized experience replay exponent
$\alpha$ : 0.0 (no priority) - number of thread/worker : 1
- hindsight experience replay : [ ]