This is a repository containing the code for the paper "PolyTask: Learning Unified Policies through Behavior Distillation".
Download expert demonstrations, weights, replay buffers and environment libraries [link]
The link contains the following:
- The expert demonstrations for all tasks in the paper.
- The weight files for behavior cloning (BC) and demonstration-guided RL (ROT).
- The relabeled replay buffers for all tasks in the paper.
- The supporting libraries for environments (Meta-World, Franka Kitchen) in the paper.
- Extract the files provided in the link
- set the
path/to/dir
portion of theroot_dir
path variable incfgs/config*.yaml
to the path of thePolyTask
repository. - place the
expert_demos
,weights
andbuffers
folders in a common directory${data_dir}
. - set the
path/to/dir
portion of thedata_dir
path variable incfgs/config*.yaml
to the path of the common data directory.
- set the
sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
- Set up Environment
conda env create -f conda_env.yml conda activate polytask
- Download the Meta-World benchmark suite and its demonstrations from here. Install the simulation environment using the following command -
pip install -e /path/to/dir/metaworld
- Download the D4RL benchmark for using the Franka Kitchen environments from here. Install the simulation environment using the following command -
pip install -e /path/to/dir/d4rl
For running the code, enter the code directory through cd polytask
and execute the following commands.
- Train BC agent
python train.py agent=bc suite=dmc obs_type=features suite.task_id=[1] num_demos_per_task=10
python train.py agent=bc suite=metaworld obs_type=pixels suite/metaworld_task=hammer num_demos_per_task=1
python train.py agent=bc suite=kitchen obs_type=pixels suite.task=['task1'] num_demos_per_task=100
python train_robot.py agent=bc suite=robotgym obs_type=pixels suite/robotgym_task=boxopen num_demos_per_task=1
- Train Demo-Guided RL (ROT)
python train.py agent=drqv2 suite=dmc obs_type=features suite.task_id=[1] num_demos_per_task=10 load_bc=true bc_regularize=true
python train.py agent=drqv2 suite=metaworld obs_type=pixels suite/metaworld_task=hammer num_demos_per_task=1 load_bc=true bc_regularize=true
python train.py agent=drqv2 suite=kitchen obs_type=pixels suite.task=['task1'] num_demos_per_task=100 load_bc=true bc_regularize=true
python train_robot.py agent=drqv2 suite=robotgym obs_type=pixels suite/robotgym_task=boxopen num_demos_per_task=1 load_bc=true bc_regularize=true
- Train PolyTask
python train_distill.py agent=distill suite=dmc obs_type=features num_envs_to_distil=10
python train_distill.py agent=distill suite=metaworld obs_type=pixels num_envs_to_distil=16
python train_distill.py agent=distill suite=kitchen obs_type=pixels num_envs_to_distil=6
python train_robot_distill.py agent=distill suite=robotgym obs_type=pixels num_envs_to_distil=6
- Monitor results
tensorboard --logdir exp_local
The code for baseline algorithms is available in the baselines
branch of this repository. The instructions to run the code are available in the README.md
file of the baselines
branch.
@article{haldar2023polytask,
title={PolyTask: Learning Unified Policies through Behavior Distillation},
author={Haldar, Siddhant and Pinto, Lerrel},
journal={arXiv preprint arXiv:2310.08573},
year={2023}
}