Iso-Dream++: Model-Based Reinforcement Learning with Isolated Imaginations (TPAMI)

Code repository for the paper:

Model-Based Reinforcement Learning with Isolated Imaginations [arxiv]

Minting Pan, Xiangming Zhu, Yitao Zheng, Yunbo Wang, Xiaokang Yang

Description

This work is an extension version of our previous work Iso-Dream (NeurIPS 2022). We propose a novel world model that utilizes modular network structures and inverse dynamics to separate mixed dynamics into controllable and noncontrollable components. This approach enables the model to individually transit different sources of visual dynamics. Furthermore, We introduce a new actor-critic algorithm that makes future-dependent decisions. The action model in this algorithm rolls out noncontrollable dynamics into the future, using an attention mechanism to learn their influence on current behavior. It enables the agent to thoroughly consider possible future interactions with the environment. This work improves Iso-Dream in the following Three aspects:

1. Min-max variance constraints

We propose the min-max variance constraints to isolate different dynamics in an unsupervised manner and prevent information collapse into a single state transition branch. The key idea is to encourage the action-conditioned branch to produce different state transitions based on the same state and distinct actions, while penalizing the diversity of those in the action-free branch.

2. Sparse dependency between decoupled State

We model the sparse dependency of next-step noncontrollable dynamics on current controllable dynamics to provide a more accurate simulation of some practical dynamic environments. We employ the dependency gate in the action-free branch to determine when the controllable dynamics affect the transition of the next noncontrollable state.

3. Transfer learning

We extend the experiments by including CARLA in the night mode and DeepMind Control Suite with video_hard backgrounds for transfer learning. The isolation of controllable state transitions further facilitates transfer learning across different but related domains. We can adapt parts of the world model to novel domains based on our prior knowledge of the domain gap.

Get Started

Iso-Dream is implemented and tested on Ubuntu 18.04 with python == 3.7, PyTorch == 1.9.0:

Create an environment

conda create -n iso-env python=3.7
conda activate iso-env

Install dependencies
```
pip install -r requirements.txt
```

CARLA / DMC

For CARLA environment:

Setup

Download and setup CARLA 0.9.10

cd iso_rl
chmod +x setup_carla.sh
./setup_carla.sh

Add to your python path:

export PYTHONPATH=$PYTHONPATH:/home/CARLA_0.9.10/PythonAPI
export PYTHONPATH=$PYTHONPATH:/home/CARLA_0.9.10/PythonAPI/carla
export PYTHONPATH=$PYTHONPATH:/home/CARLA_0.9.10/PythonAPI/carla/dist/carla-0.9.10-py3.7-linux-x86_64.egg

and merge the directories.

Training

Terminal 1:

cd CARLA_0.9.10
bash CarlaUE4.sh -fps 20 -opengl

Terminal 2:

cd iso_rl
python dreamer.py --logdir log/iso_carla --sz_sparse True --min_free True --max_action True --seed 9 --configs defaults carla

Evaluation

cd iso_rl
python test.py --logdir test --sz_sparse True --min_free True --max_action True --configs defaults carla

For DMC environment:

Setup DMC with video background

Download 'envs' from Google Drive and put it in the 'iso_rl'. The dependencies can then be installed with the following commands:
```
cd iso_rl

cd ./envs/dm_control
pip install -e .

cd ../dmc2gym
pip install -e .

cd ../..
```

Training

python dreamer.py --logdir log/iso_dmc --sz_sparse False --min_free True --max_action True --seed 4 --configs defaults dmc --task dmcbg_walker_walk

BAIR / RoboNet

Train and test Iso-Dream on BAIR and RoboNet datasets. Also, install Tensorflow 2.1.0 for BAIR dataloader.

Download BAIR data.

wget http://rail.eecs.berkeley.edu/datasets/bair_robot_pushing_dataset_v0.tar

Train the model. You can use the following bash script to train the model. The learned model will be saved in the --save_dir folder. The generated future frames will be saved in the --gen_frm_dir folder.
```
cd iso_video_prediction
sh train_iso_model.sh
```

Acknowledgement

We appreciate the following github repos where we borrow code from:

https://github.com/jsikyoon/dreamer-torch

https://github.com/thuml/predrnn-pytorch

panmt/MBRL_with_Isolated_Imaginations