Official code repo for the paper "Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning" by Chongyi Zheng, Jens Tuyls, Joanne Peng, and Benjamin Eysenbach. This paper introduces a new method which we call Contrastive Successor Features (CSF), which achieves compareable performance to current SOTA unsupervised skill discovery methods while at its core relying on mutual information maximization.
After cloning this repo, please run the following commands at the root of the project:
# Setting up the conda environment
conda create --name csf python=3.9
conda activate csf
# Installing dependencies
pip install -r requirements.txt --no-deps
pip install -e .
pip install -e garaged
pip install --upgrade joblib
pip install patchelf
Warning
Pip might complain about incompatible versions -- this is expected and can be safely ignored.
Next, we need to do some Mujoco setup.
conda activate csf
conda install -c conda-forge glew
conda install -c conda-forge mesalib
conda install -c anaconda mesa-libgl-cos6-x86_64
conda install -c menpo glfw3
We also need to tell Mujoco which backend to use. This can be done by setting the appropriate environment variables.
conda env config vars set MUJOCO_GL=egl PYOPENGL_PLATFORM=egl
conda deactivate && conda activate csf
If you don't already have Mujoco, you will need it. Install Mujoco in a folder called .mujoco
. More instructions on how to do so are linked here.
Finally, you may want to add the following environment variables to your .bashrc
file:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
export CPATH=$CONDA_PREFIX/include
Remember to source your .bashrc
file after changing it: source ~/.bashrc
.
(1) For unsupervised pretraining (state coverage), you can use the following general command. Make sure to run this from the root of the project.
sh scripts/pretrain/[method_name]/[method_name]_[env_name].sh
For example, in order to run our CSF method on the Ant environment, you would run:
sh scripts/pretrain/csf/csf_ant.sh
Note
The zero-shot goal reaching performance gets logged during the pretraining phase, and hence we don't have separate scripts for them.
(2) For hierarchical control, you can use the following general command. Again, make sure to run this from the root of the project.
sh scripts/hierarchical_control/[task].sh [method_name]
For example, in order to run a pretrained CSF policy on the AntMultiGoal environment, you would run:
sh scripts/hierarchical_control/ant_multi_goal.sh csf
Warning
All hierarchical control experiments require a pretrained policy path referred to using the cp_path
argument.
Make sure to update this in the corresponding scripts.
Once experiments are running, they will be logged under the exp
folder.
Note
All experiments were run on a single GPU, usually with between 8 - 10 workers (see the --n_parallel
flag).
In addition, we found we needed 32GB of CPU memory (RAM) for all state-based experiments (Ant and HalfCheetah), while
we needed 40GB of CPU memory for all image-based experiments (Humanoid, Quadruped, Kitchen, Robobin).
This code repo was built on the original METRA repo.