D4PG-pytorch
PyTorch implementation of Distributed Distributional Deterministic Policy Gradients (https://arxiv.org/abs/1804.08617).
About
The project is under an active renovation, for the old code with D4PG
algorithm working with multiprocessing queues and mujoco_py
please refer to the branch d4pg_legacy
.
Roadmap 🏗
- Switching to
mujoco 3.1.1
- Replacing multiprocessing queues with RabbitMQ for distributed RL
- Baselines with DDPG, TQC for
dm_control
for 1M step - Baselines with Distributed DDPG for
dm_control
- Bringing back D4PG logic on top of TQC
- Tests
- New Algos
Installation
pip install -r requirements.txt
cd src && pip install -e .
Usage
To run DDPG in a single process
python src/oprl/configs/ddpg.py --env walker-walk
To run distributed DDPG
Run RabbitMQ
docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.12-management
Run training
python src/oprl/configs/d3pg.py --env walker-walk
Results
Results for single process DDPG and TQC:
References
- Continuous control with deep reinforcement learning, [https://arxiv.org/abs/1509.02971]
- Distributed Distributional Deterministic Policy Gradients [https://arxiv.org/abs/1804.08617]