Information Bottleneck Option Learning (IBOL)

This is the code for our paper,

Jaekyeom Kim*, Seohong Park* and Gunhee Kim (*equal contribution). Unsupervised Skill Discovery with Bottleneck Option Learning. In ICML, 2021. [paper] [talk] [slides]

It includes the implementation for IBOL, specifically, the linearizer, the skill discovery method on top of it and the downstream tasks for the evaluation of them.

Citing the paper

If you find our work or this code useful in your research, please cite

@inproceedings{kim2021_ibol,
    title={Unsupervised Skill Discovery with Bottleneck Option Learning},
    author={Kim, Jaekyeom and Park, Seohong and Kim, Gunhee},
    booktitle={International Conference on Machine Learning (ICML)},
    year={2021}
}

Example Skills

We show some example skills discovered by IBOL in four MuJoCo environments without rewards.

Ant

Locomotion skills

Rotation skills (complementary to Figure 6 in the main paper)

Humanoid

HalfCheetah

Hopper (at 5x speed)

Requirements

This code is tested in environments with the following conditions:

Ubuntu 16.04 machine
CUDA-compatible GPUs
Python 3.7.10

Environment Setup

Install MuJoCo version 2.0 binaries, following the instructions. Note that they offer multiple licensing choices including the 30-day free trials.
At the top-level directory, run the following command to set up the environment.
```
pip install --no-deps -r requirements.txt
```

Training

Linearizer

Command	Environment
`python tests/main.py --train_type linearizer --env ant`	Ant
`python tests/main.py --train_type linearizer --env half_cheetah`	HalfCheetah
`python tests/main.py --train_type linearizer --env hopper`	Hopper
`python tests/main.py --train_type linearizer --env humanoid`	Humanoid
`python tests/main.py --train_type linearizer --env dkitty_randomized`	D'Kitty Randomized

Skill discovery

Command	Environment
`python tests/main.py --train_type skill_discovery --env ant --cp_path "exp/L_ANT/sampling_policy.pt"`	Ant
`python tests/main.py --train_type skill_discovery --env half_cheetah --cp_path "exp/L_HC/sampling_policy.pt"`	HalfCheetah
`python tests/main.py --train_type skill_discovery --env hopper --cp_path "exp/L_HP/sampling_policy.pt"`	Hopper
`python tests/main.py --train_type skill_discovery --env humanoid --cp_path "exp/L_HUM/sampling_policy.pt"`	Humanoid
`python tests/main.py --train_type skill_discovery --env dkitty_randomized --cp_path "exp/L_DK/sampling_policy.pt"`	D'Kitty Randomized

Downstream tasks

Command	Environment
`python tests/main.py --train_type downstream --env ant_goal --cp_path "exp/L_ANT/sampling_policy.pt" --dcp_path "exp/S_ANT/option_policy.pt"`	AntGoal
`python tests/main.py --train_type downstream --env ant_multi_goals --cp_path "exp/L_ANT/sampling_policy.pt" --dcp_path "exp/S_ANT/option_policy.pt"`	AntMultiGoals
`python tests/main.py --train_type downstream --env half_cheetah_goal --cp_path "exp/L_CH/sampling_policy.pt" --dcp_path "exp/S_CH/option_policy.pt"`	CheetahGoal
`python tests/main.py --train_type downstream --env half_cheetah_imi --cp_path "exp/L_CH/sampling_policy.pt" --dcp_path "exp/S_CH/option_policy.pt"`	CheetahImitation

Evaluation

Each training command stores its results in an experiment directory under exp/.
In each experiment directory, plots/ (files) or tb_plot/ (tensorboard) contain qualitative visualizations.
For downstream tasks, the column named TrainSp/IOD/SmoothedReward500 in progress.csv can be examined.

Acknowledgments

This code is based on garage.

jaekyeom/IBOL