/ddpg

PyTorch implementation of DDPG with Hindsight Experience Replay (HER)

Primary LanguagePython

DDPG with Hindsight Experience Replay (HER)

This is the PyTorch implementation of DDPG with Hindsight Experience Replay (HER). The code is adapted from @julianalverio's reimplementation. For more details, please check the original paper.

Getting Started

You can take the following steps to run DDPG (with HER) on the OpenAI gym Fetch environments.

Setting up a conda environment

conda create tensorflow mpi4py python=3.6 -n ddpg
conda activate ddpg

Installing the required packages

LD_LIBRARY_PATH=$HOME/.mujoco/mujoco200/bin:$LD_LIBRARY_PATH pip install mujoco_py==2.0.2.4 torch tensorboard==1.14.0 opencv-python scipy GPUtil cloudpickle requests future gym[all]

Training the agent

After setting up the packages, you can run run.py as follows to train the agent.

LD_LIBRARY_PATH=$HOME/.mujoco/mujoco200/bin:$LD_LIBRARY_PATH python run.py --num_timesteps 10000000 --num_workers 25 --env FetchPickAndPlace-v1 --replay_strategy future

There are some parameters you can set.

  • env: The OpenAI gym Fetch environment to use.
  • replay_strategy: The replay strategy to use. You need to set it to future to use HER, and none to use the original DDPG strategy.
  • num_workers: The number of workers to run the rollout.

Results

After training, you will get learning curves similar to the figure below. With HER, the agent in FetchPickAndPlace-v1 can reach score 0.9 in around 400 epochs. Without HER, i.e. the original DDPG, the agent's score stays at around 0.03 and 0.04, which means the agent doesn't learn anything.


Training results for Fetch Pick and Place task.
(red) DDPG with HER, (blue) original DDPG