PyTorch implementation of Distributed Distributional Deterministic Policy Gradients (https://arxiv.org/abs/1804.08617).
Implementation was tested on environments from OpenAI Gym.
D4PG and D3PG implementations with following features
- learner, sampler and agents run in separate processes
- exploiter agent(s) exists which acts without noise in actions on target network
- GPU is hold only by exploiters, all other exploration processes are run on CPU
Project was tested on Ubuntu 18.04, Intel i5 with 4 cores, Nvidia GTX 1080Ti
Run train.py --config configs/pendulum_d4pg.yml
python -m unittest discover
All results were obtained with configs in configs
directory
- DDPG [https://arxiv.org/abs/1509.02971]
- Distributional Perspective on RL [https://arxiv.org/abs/1804.08617]
- D4PG [https://arxiv.org/abs/1804.08617]