DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck

This repository is the official implementation of DRIBO. Our implementation is based on CURL by Misha Laskin and SAC+AE by Denis Yarats.

Installation

Install dm_control with MuJoCo Pro 2.0;

pip install dm_control
pip install git+git://github.com/denisyarats/dmc2gym.git

All of the python dependencies are in the setup.py file. They can be installed manually or with the following command:
```
pip install -e .
```
Running the natural video setting You can download the Kinetics dataset to replicate our setup.

Grab the "arranging_flower" label from the train dataset to replace backgrounds during training. The videos are in folder ../kinetics-downloader/dataset/train/arranging_flowers.
```
python download.py --classes 'arranging flowers'
```
Download the test dataset to replace backgrounds during testing. The videos are in folder ../kinetics-downloader/dataset/test.
```
python download.py --test
```

Instructions

To train a DRIBO agent on the cartpole swingup task under the clean setting run ./script/run_clean_bg_cartpole_im84_dim1024_no_stacked_frames.sh from the root of this directory. The run_clean_bg_cartpole_im84_dim1024_no_stacked_frames.sh file contains the following command, which you can modify to try different environments / hyperparamters.

CUDA_VISIBLE_DEVICES=0 python train.py \
    --domain_name cartpole \
    --task_name swingup \
    --encoder_type rssm --work_dir ./clean_log \
    --action_repeat 8 --num_eval_episodes 8 \
    --pre_transform_image_size 100 --image_size 84 --kl_balance \
    --agent DRIBO_sac --frame_stack 1 --encoder_feature_dim 1024 --save_model  \
    --seed 0 --critic_lr 1e-5 --actor_lr 1e-5 --eval_freq 10000 --batch_size 8 --num_train_steps 890000

To train a DRIBO agent on the cartpole swingup task under the natural video setting run ./script/run_noisy_bg_cartpole_im84_dim1024_no_stacked_frames.sh from the root of this directory. The run_noisy_bg_cartpole_im84_dim1024_no_stacked_frames.sh file contains the following command, which you can modify to try different environments / hyperparamters.

CUDA_VISIBLE_DEVICES=0 python train.py \
    --domain_name cartpole \
    --task_name swingup \
    --encoder_type rssm --work_dir ./log \
    --action_repeat 8 --num_eval_episodes 8 --kl_balance \
    --pre_transform_image_size 100 --image_size 84 --noisy_bg \
    --agent DRIBO_sac --frame_stack 1 --encoder_feature_dim 1024 --save_model  \
    --seed 0 --critic_lr 1e-5 --actor_lr 1e-5 --eval_freq 10000 --batch_size 8 --num_train_steps 890000

The console output is available in a form:

| train | E: 1 | S: 1000 | D: 34.7 s | R: 0.0000 | BR: 0.0000 | A_LOSS: 0.0000 | CR_LOSS: 0.0000 | MIB_LOSS: 0.0000 | skl: 0.0000 | beta: 0.0E+00

a training entry decodes as:

train - training episode
E - total number of episodes
S - total number of environment steps
D - duration in seconds to train 1 episode
R - episode reward
BR - average reward of sampled batch
A_LOSS - average loss of actor
CR_LOSS - average loss of critic
MIB_LOSS - average DRIBO loss
skl - average value of symmetrized KL divergence
beta - value of coefficient beta

while an evaluation entry:

| eval | S: 0 | ER: 22.1371

which just tells the expected reward ER evaluating current policy after S steps. Note that ER is average evaluation performance over num_eval_episodes episodes (usually 8).

Zixin-Tang/DRIBO

DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck

Installation

Instructions