DSPG

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

Tensorflow implementation of Deterministic Soft Policy Gradients (DSPG).

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using Tensorflow 1.6.0 and Python 3.6.

Partial results of this paper can be reproduced exactly by running:

./run_Walker2d.sh

Hyper-parameters can be modified with different arguments to main.py.