/IBOL

Primary LanguagePython

Information Bottleneck Option Learning (IBOL)

This is the code for our paper,

It includes the implementation for IBOL, specifically, the linearizer, the skill discovery method on top of it and the downstream tasks for the evaluation of them.

Citing the paper

If you find our work or this code useful in your research, please cite

@inproceedings{kim2021_ibol,
    title={Unsupervised Skill Discovery with Bottleneck Option Learning},
    author={Kim, Jaekyeom and Park, Seohong and Kim, Gunhee},
    booktitle={International Conference on Machine Learning (ICML)},
    year={2021}
}

Example Skills

We show some example skills discovered by IBOL in four MuJoCo environments without rewards.

Ant

Locomotion skills

Rotation skills (complementary to Figure 6 in the main paper)

Humanoid

HalfCheetah

Hopper (at 5x speed)

Requirements

This code is tested in environments with the following conditions:

  • Ubuntu 16.04 machine
  • CUDA-compatible GPUs
  • Python 3.7.10

Environment Setup

  1. Install MuJoCo version 2.0 binaries, following the instructions. Note that they offer multiple licensing choices including the 30-day free trials.
  2. At the top-level directory, run the following command to set up the environment.
    pip install --no-deps -r requirements.txt
    

Training

Linearizer

Command Environment
python tests/main.py --train_type linearizer --env ant Ant
python tests/main.py --train_type linearizer --env half_cheetah HalfCheetah
python tests/main.py --train_type linearizer --env hopper Hopper
python tests/main.py --train_type linearizer --env humanoid Humanoid
python tests/main.py --train_type linearizer --env dkitty_randomized D'Kitty Randomized

Skill discovery

Command Environment
python tests/main.py --train_type skill_discovery --env ant --cp_path "exp/L_ANT/sampling_policy.pt" Ant
python tests/main.py --train_type skill_discovery --env half_cheetah --cp_path "exp/L_HC/sampling_policy.pt" HalfCheetah
python tests/main.py --train_type skill_discovery --env hopper --cp_path "exp/L_HP/sampling_policy.pt" Hopper
python tests/main.py --train_type skill_discovery --env humanoid --cp_path "exp/L_HUM/sampling_policy.pt" Humanoid
python tests/main.py --train_type skill_discovery --env dkitty_randomized --cp_path "exp/L_DK/sampling_policy.pt" D'Kitty Randomized

Downstream tasks

Command Environment
python tests/main.py --train_type downstream --env ant_goal --cp_path "exp/L_ANT/sampling_policy.pt" --dcp_path "exp/S_ANT/option_policy.pt" AntGoal
python tests/main.py --train_type downstream --env ant_multi_goals --cp_path "exp/L_ANT/sampling_policy.pt" --dcp_path "exp/S_ANT/option_policy.pt" AntMultiGoals
python tests/main.py --train_type downstream --env half_cheetah_goal --cp_path "exp/L_CH/sampling_policy.pt" --dcp_path "exp/S_CH/option_policy.pt" CheetahGoal
python tests/main.py --train_type downstream --env half_cheetah_imi --cp_path "exp/L_CH/sampling_policy.pt" --dcp_path "exp/S_CH/option_policy.pt" CheetahImitation

Evaluation

  • Each training command stores its results in an experiment directory under exp/.
  • In each experiment directory, plots/ (files) or tb_plot/ (tensorboard) contain qualitative visualizations.
  • For downstream tasks, the column named TrainSp/IOD/SmoothedReward500 in progress.csv can be examined.

Acknowledgments

This code is based on garage.