/HDDPG-HER-RND

Hierachichal DDPG + Hindsight Experience Replay + Random Network Distillation

Primary LanguagePythonMIT LicenseMIT

HDDPG + HER + off-policy RND

This repository contains the code to implement the Hierarchical Deep Deterministic Policy Gradient (HDDPG) & Hindsight Experience Replay(HER) & Random Network Distillation(RND) algorithm. Our experiment environment is Mocojo Robot environment, including Reach、Push、PickandPlace、Slide. However, We only finished the Reach and Hand Reach task till now.

To run the codes, you can first execute the command "python run_HAC.py --layers 1 --her --normalize --retrain --env reach --episodes 5000 --threadings 1". The meaning of the flag is easy to understand, and you can read the option.py file to see all the flags. There is a "performance.jpg" showing the accuracy of training only if the threadings is 1.

Our RND is an off-policy implement as most of the popular Curiosity Driven methods are on-policy, so we need to compute the intrinsic reward every batch sampled from the replay buffer because it changes when training. We found that HER+off-policy RND maybe a powerful method for sparse reward problems.

You can read the paper LEARNING MULTI-LEVEL HIERARCHIES WITH HINDSIGHT, EXPLORATION BY RANDOM NETWORK DISTILLATION, Hindsight Experience Replay for more detailed algorithm infomation.

More details will be added later.

Thanks to the author of HAC, HER and RND.

Result

Compare of all methods mixed with HER in FetchReach Env:
image

Compare of all methods in HandReach Env:
image

Performance:
image image

Version LOG

2019/5/7 First Version

  1. Hierachical DDPG and HER;

  2. Observation (State/Goal) Normalization;

  3. RND;

  4. Mutilprocessing (so we can run many experiments in the same time);

  5. Reach and Push environment;

2019/5/10 Update

  1. Use gym to create environment class(so it is easy to use other environment);

  2. Hand Reach environment;

2019/6/21 Update

  1. update the result of experiment;