This repository contains the implementation of prioritized hindsight with dual buffer created by Sofanit Wubeshet Beyene and Ji-Hyeong Han from Seoul National University of Science and Technology.
Sharing prior knowledge across multiple robotic manipulation tasks is a challenging
research topic. Although the state-of-the-art deep reinforcement learning (DRL) algorithms have
shown immense success in single robotic tasks, it is still challenging to extend these algorithms to
apply directly to resolve multi-task manipulation problems. This is mostly due to the problems
associated with efficient exploration in high-dimensional state and continuous action spaces. Furthermore, in multi-task scenarios, the problem of sparse reward and sample inefficiency of DRL
algorithms is exacerbated. Therefore, we propose a method to increase the sample efficiency of the
soft-actor critic (SAC) algorithm and extend it to a multi-task setting. The agent learns a prior policy
from two structurally similar tasks and adapts the policy to a target task.
We propose a prioritized hindsight with dual experience replay to improve data storage and sampling technique which, in
turn, assists the agent in performing structured exploration that leads to sample efficiency. The
proposed method separate the experience replay buffer into two buffers to contain real trajectories
and hindsight trajectories to reduce the bias introduced by the hindsight trajectories in the buffer.
Moreover, we utilize high-reward transitions from previous tasks to assist the network in easily
adapting to the new task. We demonstrate the proposed method based on several manipulation tasks
using a 7-DoF robotic arm in RLBench. The experimental results show that the proposed method
outperforms vanilla SAC in both single-task setting and multi-task setting.
Environment
Python3.8
torch1.9.1
Install RLBench
env/conda_environment.yml file contains all the necessary packages required to run this experiment.
env/reach_target.py by tasks/reach_target.py
env/close_box.py by tasks/close_box.py
env/close_mircowave.py to tasks/close_mircrowave.py
To run HER
python algo/check_multi_main.py
To run the proposed use the same sac.py
python algo/final/final_multi_main.py