This is code to accompany the NeurIPS 2023 paper Accelerating Exploration with Unlabeled Prior Data.
The code is built off from https://github.com/ikostrikov/rlpd/ and the ICVF implementation is from https://github.com/dibyaghosh/icvf_release/.
ExPLORe is licensed under CC-BY-NC, however portions of the project (indicated by the header in each file influenced) are available under separate license terms:
- rlpd (https://github.com/ikostrikov/) is licensed under the MIT license.
- icvf (https://github.com/dibyaghosh/icvf_release) is licensed under the MIT license.
./create_env
XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning.py \
--env_name=antmaze-large-diverse-v2 \
--max_steps=300000 \
--config.backup_entropy=False \
--config.num_min_qs=1 \
--offline_relabel_type=pred \
--use_rnd_offline=True \
--eval_episodes=10 \
--project_name=explore-antmaze \
--seed=0
First, download and unzip .npy
files into ~/.datasets/awac-data/
from here.
Make sure you have mjrl
installed:
git clone https://github.com/aravindr93/mjrl
cd mjrl
pip install -e .
Then, recursively clone mj_envs
from this fork:
git clone --recursive https://github.com/philipjball/mj_envs.git
Then sync the submodules (add the --init
flag if you didn't recursively clone):
$ cd mj_envs
$ git submodule update --remote
Finally:
$ pip install -e .
Now you can run the following in this directory
XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning.py \
--env_name=pen-binary-v0 \
--max_steps=1000000 \
--config.backup_entropy=False \
--offline_relabel_type=pred \
--use_rnd_offline=True \
--eval_episodes=10 \
--project_name=explore-adroit \
--seed=0
XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning.py \
--env_name=relocate-binary-v0 \
--reset_rm_every=1000 \
--max_steps=1000000 \
--config.backup_entropy=False \
--offline_relabel_type=pred \
--use_rnd_offline=True \
--project_name=explore-adroit \
--eval_episodes=10 \
--seed=0
Based on https://github.com/avisingh599/cog.
First, install roboverse for COG: https://github.com/avisingh599/roboverse, for the environment. Follow the instructions in data/README.md
to obtain the dataset.
XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning_pixels.py \
--env_name=Widow250PickTray-v0 \
--project_name=explore-cog \
--offline_relabel_type=pred \
--use_rnd_offline=True \
--use_icvf=True \
--dataset_subsample_ratio=0.1 \
--seed=0
./generate_all.sh [partition] [conda environment] [resource constraint]
will generate all the sbatch scripts (with three seeds. The paper uses 10 for AntMaze and Adroit, 20 for COG) under sbatch/
and ./submit_all.sh
will launch all of them.
Assuming that all the experiments above completed successfully with wandb tracking. The following steps can be followed to generate paper-style figures.
cd plotting
python wandb_dl.py --entity=[YOUR WANDB ENTITY/USERNAME] --domain=antmaze --project_name=release-explore-antmaze
python wandb_dl.py --entity=[YOUR WANDB ENTITY/USERNAME] --domain=adroit --project_name=release-explore-adroit
python wandb_dl.py --entity=[YOUR WANDB ENTITY/USERNAME] --domain=cog --project_name=release-explore-cog
python make_plots.py --domain=all
run_antmaze.sh
cd plotting
python visualize_maze.py
@inproceedings{
li2023accelerating,
title={Accelerating Exploration with Unlabeled Prior Data},
author={Qiyang Li and Jason Zhang and Dibya Ghosh and Amy Zhang and Sergey Levine},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=Itorzn4Kwf}
}