/D4PG

PyTorch implementation of D4PG with the SOTA IQN Critic instead of C51. Implementation includes also the extensions Munchausen RL and D2RL which can be added to D4PG to improve its performance.

Primary LanguagePython

PyTorch implementation of D4PG

This repository contains a PyTorch implementation of D4PG with IQN as the improved distributional Critic instead of C51. Also the extentions Munchausen RL and D2RL are added and can be combined with D4PG as needed.

Dependencies

Trained and tested on:

Python 3.6
PyTorch 1.4.0  
Numpy 1.15.2 
gym 0.10.11 

How to use:

The new script combines all extensions and the add-ons can be simply added by setting the corresponding flags.

python run.py -info your_run_info

Parameter: To see the options: python run.py -h

Observe training results

tensorboard --logdir=runs

Added Extensions:

  • Prioritized Experience Replay [X]
  • N-Step Bootstrapping [X]
  • D2RL [X]
  • Distributional IQN Critic [X]
  • Munchausen RL [X]
  • Parallel-Environments [X]

Results

Environment: Pendulum

Pendulum

Below you can see how IQN reduced the variance of the Critic loss:

CriticLoss

Environment: LunarLander

LunarLander

Notes:

  • Performance depends a lot on good hyperparameter->> tau for Per bigger (pendulum 1e-2) for regular replay (1e-3)

  • BatchNorm had good impact on the overall performance (!)