/d4pg-pytorch

PyTorch implementation of Distributed Distributional Deterministic Policy Gradients

Primary LanguagePython

D4PG-pytorch

PyTorch implementation of Distributed Distributional Deterministic Policy Gradients (https://arxiv.org/abs/1804.08617).

d4pg_arch

About

The project is under an active renovation, for the old code with D4PG algorithm working with multiprocessing queues and mujoco_py please refer to the branch d4pg_legacy.

Roadmap 🏗

  • Switching to mujoco 3.1.1
  • Replacing multiprocessing queues with RabbitMQ for distributed RL
  • Baselines with DDPG, TQC for dm_control for 1M step
  • Baselines with Distributed DDPG for dm_control
  • Bringing back D4PG logic on top of TQC
  • Tests
  • New Algos

Installation

pip install -r requirements.txt
cd src && pip install -e .

Usage

To run DDPG in a single process

python src/oprl/configs/ddpg.py --env walker-walk

To run distributed DDPG

Run RabbitMQ

docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.12-management

Run training

python src/oprl/configs/d3pg.py --env walker-walk

Results

Results for single process DDPG and TQC: ddpg_tqc_eval

References