Prioritized Hindsight with Dual Buffer for meta-reinforcement learning

This repository contains the implementation of prioritized hindsight with dual buffer created by Sofanit Wubeshet Beyene and Ji-Hyeong Han from Seoul National University of Science and Technology.

Abstract

Sharing prior knowledge across multiple robotic manipulation tasks is a challenging research topic. Although the state-of-the-art deep reinforcement learning (DRL) algorithms have shown immense success in single robotic tasks, it is still challenging to extend these algorithms to apply directly to resolve multi-task manipulation problems. This is mostly due to the problems associated with efficient exploration in high-dimensional state and continuous action spaces. Furthermore, in multi-task scenarios, the problem of sparse reward and sample inefficiency of DRL algorithms is exacerbated. Therefore, we propose a method to increase the sample efficiency of the soft-actor critic (SAC) algorithm and extend it to a multi-task setting. The agent learns a prior policy from two structurally similar tasks and adapts the policy to a target task.
We propose a prioritized hindsight with dual experience replay to improve data storage and sampling technique which, in turn, assists the agent in performing structured exploration that leads to sample efficiency. The proposed method separate the experience replay buffer into two buffers to contain real trajectories and hindsight trajectories to reduce the bias introduced by the hindsight trajectories in the buffer. Moreover, we utilize high-reward transitions from previous tasks to assist the network in easily adapting to the new task. We demonstrate the proposed method based on several manipulation tasks using a 7-DoF robotic arm in RLBench. The experimental results show that the proposed method outperforms vanilla SAC in both single-task setting and multi-task setting.

p_reach_2 closebox closemicrowave

Installation

Requirements

Environment
 Python3.8
 torch1.9.1

Install RLBench

https://github.com/stepjam/RLBench

How to run

Replace environment files

env/conda_environment.yml file contains all the necessary packages required to run this experiment.


env/reach_target.py by tasks/reach_target.py
env/close_box.py by tasks/close_box.py
env/close_mircowave.py to tasks/close_mircrowave.py

Run

To run HER

python algo/check_multi_main.py 

To run the proposed use the same sac.py

python algo/final/final_multi_main.py