This is the code for Choreographer: a model-based agent that discovers and learns unsupervised skills in latent imagination, and it's able to efficiently coordinate and adapt the skills to solve downstream tasks.
If you find the code useful, please refer to our work using:
@inproceedings{
Mazzaglia2023Choreographer,
title={Choreographer: Learning and Adapting Skills in Imagination},
author={Pietro Mazzaglia and Tim Verbelen and Bart Dhoedt and Alexandre Lacoste and Sai Rajeswar},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=PhkWyijGi5b}
}
We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running
conda env create -f conda_env.yml
After the instalation ends you can activate your environment with
conda activate choreo
Agent | Command |
---|---|
Choreographer | agent=choreo |
We support the following domains.
Domain | Tasks |
---|---|
walker |
stand , walk , run , flip |
quadruped |
walk , run , stand , jump |
jaco |
reach_top_left , reach_top_right , reach_bottom_left , reach_bottom_right |
mw |
reach |
The datasets from the ExORL paper can be downloaded on their repository. The default loading directory is:
~/urlb_datasets/${dataset}/${domain}/${collection_method}/buffer
For example:
~/urlb_datasets/exorl/walker/rnd/buffer
The dataset folder can be changed in the offline_train.yaml
file, replacing the value at the key dataset_dir
or it can be provided when launching the offline pre-training, as a command line argument.
For example:
python offline_train.py ... dataset_dir=$MYPATH/exorl/walker/rnd/buffer
To run pre-training from offline data use the offline_train.py
script
python offline_train.py configs=dmc_states agent=choreo dataset=exorl collection_method=rnd domain=walker seed=1
This script will produce several agent snapshots after training for 10k
, 50k
, 100k
, and 200k
update steps. The snapshots will be stored under the following directory:
./offline_models/${dataset}/${collection_method}/${domain}/${agent.name}/${seed}
For example:
./offline_models/exorl/rnd/walker/choreo/1
To run pre-training in parallel to (LBS) exploration use the pretrain.py
script
python pretrain.py configs=dmc_pixels agent=choreo domain=walker
This script will produce several agent snapshots after training for 100k
, 500k
, 1M
, and 2M
frames. The snapshots will be stored under the following directory:
./pretrained_models/${obs_type}/${domain}/${agent.name}/${seed}
For example:
./pretrained_models/pixels/walker/choreo/1
Once you have pre-trained your agent, you can fine-tune it on a downstream task.
For example, let's say you have pre-trained Choreographer on the walker
domain ExORL data, you can fine-tune it on walker_run
by running the following command:
python finetune.py configs=dmc_states agent=choreo task=walker_run from_offline=True dataset=exorl collection_method=rnd snapshot_ts=200000 seed=1
This will load a snapshot stored in ./offline_models/exorl/rnd/walker/choreo/1/snapshot_200000.pt
, initialize Choreographer
's models and skill policies with it, and start training on walker_run
using the meta-controller to maximize the extrinsic rewards of the task.
For example, let's say you have pre-trained Choreographer on the walker
domain, you can fine-tune it on walker_run
by running the following command:
python finetune.py configs=dmc_pixels agent=choreo task=walker_run snapshot_ts=2000000 seed=1
This will load a snapshot stored in ./pretrained_models/pixels/walker/choreo/1/snapshot_2000000.pt
, initialize Choreographer
's models and skill policies with it, and start training on walker_run
using the meta-controller to maximize the extrinsic rewards of the task.
To perform zero-shot evaluation on Jaco, ensure that you pre-trained the model in the pixel settings and that is correctly stored under pretrained_models
. Then, you can run:
python finetune.py configs=dmc_pixels task=$JACO_TASK snapshot_ts=2000000 num_train_frames=10 num_eval_episodes=100 eval_goals=True
where $JACO_TASK is one of the Jaco reach task.
To pre-train on the Meta-World reach environent you can run the following command:
python pretrain.py configs=mw_pixels domain=mw`
To perform zero-shot evaluation on Meta-World, ensure that you pre-trained the model and that is correctly stored under pretrained_models
. Then, you can run:
python finetune.py configs=mw_pixels task=mw_reach snapshot_ts=2000000 task_id=$TASK_ID num_train_frames=10 num_eval_episodes=100 eval_goals=True agent.update_skill_every_step=50
where $TASK_ID is a value in range(0,50)
. The goals are stored under mw_tasks/reach_harder
. There are 10 sets of goals, which can be used by setting the evaluation seed, e.g. seed=0
.
Logs are stored in the exp_local
folder. To launch tensorboard run:
tensorboard --logdir exp_local
The console output is also available in a form:
| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42
a training entry decodes as
F : total number of environment frames
S : total number of agent steps
E : total number of episodes
R : episode return
FPS: training throughput (frames per second)
T : total training time
You can also use Weights and Bias, by launching the experiments with use_wandb=True
.
The environment implementations come from URLB. The model implementation is inspired by DreamerV2. We also thank the authors of MetaWorld.