This repository contains the code for the paper On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning by Philipp Becker and Gerhard Neumann published in TMLR, October 2022.
Correspondence to philipp.becker@kit.edu (or open an issue if it is regarding the code)
Tested with:
- Python 3.10
- torch 1.12.1 (with CUDA 11.6)
- gym 0.26.2
- dm_control 1.0.7
also needs:
- imageio/imageio-ffmpeg
For full conda envrionment setup see requirements.txt
.
- envs: Wrapper around the Deep Mind Control suite, including the used modifications for the experiments (occlusions, missing observations, sensor fusion)
- experiments: scripts for running the experiments (see "Running Experiments" below)
- ssm_mbrl
- mbrl: General Model Based RL functionality for agents based on the cross-entropy method (cem, i.e. PlaNet) and based on latent actor-value (av, i.e. Dreamer) training
- rssm: Implementation of the Recurrent State Space Model from (Hafner et al, 2019) and baseline models.
- vrkn: Implementation of the Variational Recurrent Kalman Network.
The experiments can be run with the scripts in the experiments
folder, actor_value
for Dreamer agents,
and cross_entropy_method
for PlaNet agents. The configs can be directly adapted in the scripts.
Evaluation of the Effect of Epistemic Uncertainty on Different Smoothing Architectures (Section 4.1 in paper)
In the rssm scripts set
model_config.transition.type = "r_rssm"
andmodel_config.smoothing_rssm = False
for original RSSMmodel_config.transition.type = "r_rssm"
andmodel_config.smoothing_rssm = True
for Smooth RSSMmodel_config.transition.type = "mcd_r_rssm"
andmodel_config.smoothing_rssm = False
for MCD RSSMmodel_config.transition.type = "mcd_r_rssm"
andmodel_config.smoothing_rssm = True
for Smooth MCD-RSSM
In the vrkn scripts set
model_config.transition.type = "vrkn"
for VRKNmodel_config.transition.type = "vrkn_no_mcd"
for VRKN without epistemic uncertainty (VRKN (no MCD))
Evaluation on Tasks where Aleatoric Uncertainty Matters (Section 4.2 in paper)
In all scripts you can adapt the configuration of the environment accordingly.
The transition noise can be setup with env_config.transition_noise_std
. For the different observation types use:
Occlusions: set env_config.occluded = True
and env_config.occlusion_type = "walls"
or env_config.occlusion_type = "disks"
The occlusions are prerendered and can be found here https://drive.google.com/drive/folders/112N4_gr6XiYCWdKxC0S9fHH0BuC2fb1P?usp=sharing
and need to be placed in the envs/occlusion_data
folder (alternatively, edit the path in envs/env_factory.py
).
Missing Observations: set env_config.subsample_img_freq = True
Sensor Fusion: env_config.obs_tpye="img_pro_pos"
(i.e., image and proprioceptive position) and env_config.subsample_img_freq = True
To reduce dependencies, configuration and logging is currently very primitive
Configuration is done directly in the scripts but can be easily extended with tools like hydra or argparse
Logging: The iterate function returns a log_dict with all metrics which are printed. You can also use it for your favorite logging tool.