Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

Code for Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration, ICLR 2022 (Spotlight)

This codebase is based on a publicly available github repository Khrylx/PyTorch-RL

To run experiments, you will need to install the following packages preferably in a conda virtual environment

gym 0.18.0
pytorch 1.8.1
mujoco-py 2.0.2.13
tesnsorboard 2.5.0

The python file to run LOGO is present in logo/run_logo.py

To run the code with the default parameters, simply execute the following command

python run_logo.py --env-num i

Where i is an integer between 1-8 corresponding to the following experiments

Hopper-v2
Censored Hopper-v2
HalfCheetah-v2
Censored HalfCheetah-v2
Walker2d-v2
Censored Walker2d-v2
InvertedDoublePendulum-v2
Censored InvertedDoublePendulum-v2

The tensorboard logs will be saved in a folder titled 'Results'

For the full observation setting, we can initialize the policy network using behavior cloning, this enables faster learning, to do so simply execute the following command

python run_logo.py --env-num i --init-BC

DesikRengarajan/LOGO

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration