D4PG-pytorch

PyTorch implementation of Distributed Distributional Deterministic Policy Gradients (https://arxiv.org/abs/1804.08617).

About

The project is under an active renovation, for the old code with D4PG algorithm working with multiprocessing queues and mujoco_py please refer to the branch d4pg_legacy.

Roadmap 🏗

Switching to mujoco 3.1.1
Replacing multiprocessing queues with RabbitMQ for distributed RL
Baselines with DDPG, TQC for dm_control for 1M step
Baselines with Distributed DDPG for dm_control
Bringing back D4PG logic on top of TQC
Tests
New Algos

Installation

pip install -r requirements.txt
cd src && pip install -e .

Usage

To run DDPG in a single process

python src/oprl/configs/ddpg.py --env walker-walk

To run distributed DDPG

Run RabbitMQ

docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.12-management

Run training

python src/oprl/configs/d3pg.py --env walker-walk

Results

Results for single process DDPG and TQC:

References

Continuous control with deep reinforcement learning, [https://arxiv.org/abs/1509.02971]
Distributed Distributional Deterministic Policy Gradients [https://arxiv.org/abs/1804.08617]