/evaluating_bdl

Official implementation of "Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision", 2019

Primary LanguagePythonMIT LicenseMIT

evaluating_bdl

overview image

Official implementation (PyTorch) of the paper:
Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision, 2019 [arXiv] [project].
Fredrik K. Gustafsson, Martin Danelljan, Thomas B. Schön.

We propose an evaluation framework for predictive uncertainty estimation that is specifically designed to test the robustness required in real-world computer vision applications. Using the proposed framework, we perform an extensive comparison of the popular ensembling and MC-dropout methods on the tasks of depth completion and street-scene semantic segmentation. Our comparison suggests that ensembling consistently provides more reliable uncertainty estimates.

Youtube video with qualitative results:
demo video with qualitative results

If you find this work useful, please consider citing:

@article{gustafsson2019evaluating,
  title={Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision},
  author={Gustafsson, Fredrik K and Danelljan, Martin and Sch{\"o}n, Thomas B},
  journal={arXiv preprint arXiv:1906.01620},
  year={2019}
}

Acknowledgements

Index







Usage

The code has been tested on Ubuntu 16.04. Docker images are provided (see below).

depthCompletion

  • $ sudo docker pull fregu856/evaluating_bdl:pytorch_pytorch_0.4_cuda9_cudnn7_evaluating_bdl
  • Create start_docker_image_toyProblems_depthCompletion.sh containing (My username on the server is fregu482, i.e., my home folder is /home/fregu482. You will have to modify this accordingly):
#!/bin/bash

# DEFAULT VALUES
GPUIDS="0"
NAME="toyProblems_depthCompletion_GPU"

NV_GPU="$GPUIDS" nvidia-docker run -it --rm --shm-size 12G \
        -p 5700:5700\
        --name "$NAME""0" \
        -v /home/fregu482:/root/ \
        fregu856/evaluating_bdl:pytorch_pytorch_0.4_cuda9_cudnn7_evaluating_bdl bash
  • (Inside the image, /root/ will now be mapped to /home/fregu482, i.e., $ cd -- takes you to the regular home folder)

  • (To create more containers, change the lines GPUIDS="0", --name "$NAME""0" and -p 5700:5700)

  • General Docker usage:

    • To start the image:
      • $ sudo sh start_docker_image_toyProblems_depthCompletion.sh
    • To commit changes to the image:
      • Open a new terminal window.
      • $ sudo docker commit toyProblems_depthCompletion_GPU0 fregu856/evaluating_bdl:pytorch_pytorch_0.4_cuda9_cudnn7_evaluating_bdl
    • To exit the image without killing running code:
      • Ctrl + P + Q
    • To get back into a running image:
      • $ sudo docker attach toyProblems_depthCompletion_GPU0
  • Download the KITTI depth completion dataset (data_depth_annotated.zip, data_depth_selection.zip and data_depth_velodyne.zip) and place it in /root/data/kitti_depth (/root/data/kitti_depth should contain the folders train, val and depth_selection).

  • Create /root/data/kitti_raw and download the KITTI raw dataset using download_kitti_raw.py.

  • Create /root/data/kitti_rgb. For each folder in /root/data/kitti_depth/train (e.g. 2011_09_26_drive_0001_sync), copy the corresponding folder in /root/data/kitti_raw and place it in /root/data/kitti_rgb/train.

  • Download the virtual KITTI dataset (vkitti_1.3.1_depthgt.tar and vkitti_1.3.1_rgb.tar) and place in /root/data/virtualkitti (/root/data/virtualkitti should contain the folders vkitti_1.3.1_depthgt and vkitti_1.3.1_rgb).

  • Example usage:

$ sudo sh start_docker_image_toyProblems_depthCompletion.sh
$ cd --
$ python evaluating_bdl/depthCompletion/ensembling_train_virtual.py



segmentation

  • $ sudo docker pull fregu856/evaluating_bdl:rainbowsecret_pytorch04_20180905_evaluating_bdl
  • Create start_docker_image_segmentation.sh containing (My username on the server is fregu482, i.e., my home folder is /home/fregu482. You will have to modify this accordingly):
#!/bin/bash

# DEFAULT VALUES
GPUIDS="0,1"
NAME="segmentation_GPU"

NV_GPU="$GPUIDS" nvidia-docker run -it --rm --shm-size 12G \
        -p 5900:5900 \
        --name "$NAME""01" \
        -v /home/fregu482:/home/ \
        fregu856/evaluating_bdl:rainbowsecret_pytorch04_20180905_evaluating_bdl bash
  • (Inside the image, /home/ will now be mapped to /home/fregu482, i.e., $ cd home takes you to the regular home folder)

  • (To create more containers, change the lines GPUIDS="0,1", --name "$NAME""01" and -p 5900:5900)

  • General Docker usage:

    • To start the image:
      • $ sudo sh start_docker_image_segmentation.sh
    • To commit changes to the image:
      • Open a new terminal window.
      • $ sudo docker commit segmentation_GPU01 fregu856/evaluating_bdl:rainbowsecret_pytorch04_20180905_evaluating_bdl
    • To exit the image without killing running code:
      • Ctrl + P + Q
    • To get back into a running image:
      • $ sudo docker attach segmentation_GPU01
  • Download resnet101-imagenet.pth from here and place it in segmentation.

  • Download the Cityscapes dataset and place it in /home/data/cityscapes (/home/data/cityscapes should contain the folders leftImg8bit and gtFine).

  • Download the Synscapes dataset and place it in /home/data/synscapes (/home/data/synscapes should contain the folder img, which in turn should contain the folders rgb-2k and class).

  • Run segmentation/utils/preprocess_synscapes.py (This will, among other things, create /home/data/synscapes_meta/train_img_ids.pkl and /home/data/synscapes_meta/val_img_ids.pkl by randomly selecting subsets of examples. The ones used in the paper are found in segmentation/lists/synscapes).

  • Example usage:

$ sudo sh start_docker_image_segmentation.sh
$ cd home
$ /root/miniconda3/bin/python evaluating_bdl/segmentation/ensembling_train_syn.py



toyRegression

  • Example usage:
$ sudo sh start_docker_image_toyProblems_depthCompletion.sh
$ cd --
$ python evaluating_bdl/toyRegression/Ensemble-Adam/train.py 



toyClassification

  • Example usage:
$ sudo sh start_docker_image_toyProblems_depthCompletion.sh
$ cd --
$ python evaluating_bdl/toyClassification/Ensemble-Adam/train.py 






Documentation

Documentation/depthCompletion

  • Example usage:
$ sudo sh start_docker_image_toyProblems_depthCompletion.sh
$ cd --
$ python evaluating_bdl/depthCompletion/ensembling_train_virtual.py
  • criterion.py: Definitions of losses and metrics.

  • datasets.py: Definitions of datasets, for KITTI depth completion (KITTI) and virtualKITTI.

  • model.py: Definition of the CNN.

  • model_mcdropout.py: Definition of the CNN, with inserted dropout layers.

  • %%%%%

  • ensembling_train.py: Code for training M model.py models, on KITTI train.

  • ensembling_train_virtual.py: As above, but on virtualKITTI train.

  • ensembling_eval.py: Computes the loss and RMSE for a trained ensemble, on KITTI val. Also creates visualization images of the input data, ground truth, prediction and the estimated uncertainty.

  • ensembling_eval_virtual.py: As above, but on virtualKITTI val.

  • ensembling_eval_auce.py: Computes the AUCE (mean +- std) for M = [1, 2, 4, 8, 16, 32] on KITTI val, based on a total of 33 trained ensemble members. Also creates calibration plots.

  • ensembling_eval_auce_virtual.py: As above, but on virtualKITTI val.

  • ensembling_eval_ause.py: Computes the AUSE (mean +- std) for M = [1, 2, 4, 8, 16, 32] on KITTI val, based on a total of 33 trained ensemble members. Also creates sparsification plots and sparsification error curves.

  • ensembling_eval_ause_virtual.py: As above, but on virtualKITTI val.

  • ensembling_eval_seq.py: Creates visualization videos (input data, ground truth, prediction and the estimated uncertainty) for a trained ensemble, on all sequences in KITTI val.

  • ensembling_eval_seq_virtual.py: As above, but on all sequences in virtualKITTI val.

  • %%%%%

  • mcdropout_train.py: Code for training M model_mcdropout.py models, on KITTI train.

  • mcdropout_train_virtual.py: As above, but on virtualKITTI train.

  • mcdropout_eval.py: Computes the loss and RMSE for a trained MC-dropout model with M forward passes, on KITTI val. Also creates visualization images of the input data, ground truth, prediction and the estimated uncertainty.

  • mcdropout_eval_virtual.py: As above, but on virtualKITTI val.

  • mcdropout_eval_auce.py: Computes the AUCE (mean +- std) for M = [1, 2, 4, 8, 16, 32] forward passes on KITTI val, based on a total of 16 trained MC-dropout models. Also creates calibration plots.

  • mcdropout_eval_auce_virtual.py: As above, but on virtualKITTI val.

  • mcdropout_eval_ause.py: Computes the AUSE (mean +- std) for M = [1, 2, 4, 8, 16, 32] forward passes on KITTI val, based on a total of 16 trained MC-dropout models. Also creates sparsification plots and sparsification error curves.

  • mcdropout_eval_ause_virtual.py: As above, but on virtualKITTI val.

  • mcdropout_eval_seq.py: Creates visualization videos (input data, ground truth, prediction and the estimated uncertainty) for a trained MC-dropout model with M forward passes, on all sequences in KITTI val.

  • mcdropout_eval_seq_virtual.py: As above, but on all sequences in virtualKITTI val.




Documentation/segmentation

  • Example usage:
$ sudo sh start_docker_image_segmentation.sh
$ cd home
$ /root/miniconda3/bin/python evaluating_bdl/segmentation/ensembling_train_syn.py
  • models:

      • model.py: Definition of the CNN.
      • model_mcdropout.py: Definition of the CNN, with inserted dropout layers.
      • aspp.py: Definition of the ASPP module.
      • resnet_block.py: Definition of a ResNet block.
  • utils:

      • criterion.py: Definition of the cross-entropy loss.
      • preprocess_synscapes.py: Creates the Synscapes train (val) dataset by randomly selecting a subset of 2975 (500) examples, and resizes the labels to 1024 x 2048.
      • utils.py: Helper functions for evaluation and visualization.
  • datasets.py: Definitions of datasets, for Cityscapes and Synscapes.

  • %%%%%

  • ensembling_train.py: Code for training M model.py models, on Cityscapes train.

  • ensembling_train_syn.py: As above, but on Synscapes train.

  • ensembling_eval.py: Computes the mIoU for a trained ensemble, on Cityscapes val. Also creates visualization images of the input image, ground truth, prediction and the estimated uncertainty.

  • ensembling_eval_syn.py: As above, but on Synscapes val.

  • ensembling_eval_ause_ece.py: Computes the AUSE (mean +- std) and ECE (mean +- std) for M = [1, 2, 4, 8, 16] on Cityscapes val, based on a total of 26 trained ensemble members. Also creates sparsification plots, sparsification error curves and reliability diagrams.

  • ensembling_eval_ause_ece_syn.py: As above, but on Synscapes val.

  • ensembling_eval_seq.py: Creates visualization videos (input image, prediction and the estimated uncertainty) for a trained ensemble, on the three demo sequences in Cityscapes.

  • ensembling_eval_seq_syn.py: Creates a visualization video (input image, ground truth, prediction and the estimated uncertainty) for a trained ensemble, showing the 30 first images in Synscapes val.

  • %%%%%

  • mcdropout_train.py: Code for training M model_mcdropout.py models, on Cityscapes train.

  • mcdropout_train_syn.py: As above, but on Synscapes train.

  • mcdropout_eval.py: Computes the mIoU for a trained MC-dropout model with M forward passes, on Cityscapes val. Also creates visualization images of the input image, ground truth, prediction and the estimated uncertainty.

  • mcdropout_eval_syn.py: As above, but on Synscapes val.

  • mcdropout_eval_ause_ece.py: Computes the AUSE (mean +- std) and ECE (mean +- std) for M = [1, 2, 4, 8, 16] forward passes on Cityscapes val, based on a total of 8 trained MC-dropout models. Also creates sparsification plots, sparsification error curves and reliability diagrams.

  • mcdropout_eval_ause_ece_syn.py: As above, but on Synscapes val.

  • mcdropout_eval_seq.py: Creates visualization videos (input image, prediction and the estimated uncertainty) for a trained MC-dropout model with M forward passes, on the three demo sequences in Cityscapes.

  • mcdropout_eval_seq_syn.py: Creates a visualization video (input image, ground truth, prediction and the estimated uncertainty) for a trained MC-dropout model with M forward passes, showing the 30 first images in Synscapes val.




Documentation/toyRegression

  • Example usage:
$ sudo sh start_docker_image_toyProblems_depthCompletion.sh
$ cd --
$ python evaluating_bdl/toyRegression/Ensemble-Adam/train.py 
  • Ensemble-Adam:

    • Ensembling by minimizing the MLE objective using Adam and random initialization.
      • datasets.py: Definition of the training dataset.
      • model.py: Definition of the feed-forward neural network.
      • train.py: Code for training M models.
      • eval.py: Creates a plot of the obtained predictive distribution and the HMC "ground truth" predictive distribution, for a set value of M. Also creates histograms for the model parameters.
      • eval_plots.py: Creates plots of the obtained predictive distributions for different values of M.
      • eval_kl_div.py: Computes the KL divergence between the obtained predictive distribution and the HMC "ground truth", for different values of M.
  • Ensemble-MAP-Adam:

      • Ensembling by minimizing the MAP objective using Adam and random initialization.
  • Ensemble-MAP-Adam-Fixed:

      • Ensembling by minimizing the MAP objective using Adam and NO random initialization.
  • Ensemble-MAP-SGD:

      • Ensembling by minimizing the MAP objective using SGD and random initialization.
  • Ensemble-MAP-SGDMOM:

      • Ensembling by minimizing the MAP objective using SGDMOM and random initialization.
  • MC-Dropout-MAP-02-Adam:

      • MC-dropout by minimizing the MAP objective using Adam, p=0.2.
  • MC-Dropout-MAP-02-SGD

      • MC-dropout by minimizing the MAP objective using SGD, p=0.2.
  • MC-Dropout-MAP-02-SGDMOM:

      • MC-dropout by minimizing the MAP objective using SGDMOM, p=0.2.
  • SGLD-256:

      • Implementation of SGLD, trained for 256 times longer than each member of an ensemble.
  • SGLD-64:

      • Implementation of SGLD, trained for 64 times longer than each member of an ensemble..
  • SGHMC-256:

      • Implementation of SGHMC, trained for 256 times longer than each member of an ensemble.
  • SGHMC-64:

      • Implementation of SGHMC, trained for 64 times longer than each member of an ensemble.
  • HMC:

      • Implementation of HMC using Pyro.
  • Deterministic:

      • Implementation of a fully deterministic model, i.e., direct regression.



Documentation/toyClassification

  • Example usage:
$ sudo sh start_docker_image_toyProblems_depthCompletion.sh
$ cd --
$ python evaluating_bdl/toyClassification/Ensemble-Adam/train.py 
  • Ensemble-Adam:

    • Ensembling by minimizing the MLE objective using Adam and random initialization.
      • datasets.py: Definition of the training dataset.
      • model.py: Definition of the feed-forward neural network.
      • train.py: Code for training M models.
      • eval.py: Creates a plot of the obtained predictive distribution and the HMC "ground truth" predictive distribution, for a set value of M. Also creates histograms for the model parameters.
      • eval_plots.py: Creates plots of the obtained predictive distributions for different values of M.
      • eval_kl_div.py: Computes the KL divergence between the obtained predictive distribution and the HMC "ground truth", for different values of M.
  • Ensemble-Adam-Fixed:

      • Ensembling by minimizing the MLE objective using Adam and NO random initialization.
  • Ensemble-MAP-Adam:

      • Ensembling by minimizing the MAP objective using Adam and random initialization.
  • Ensemble-MAP-SGD:

      • Ensembling by minimizing the MAP objective using SGD and random initialization.
  • Ensemble-MAP-SGDMOM:

      • Ensembling by minimizing the MAP objective using SGDMOM and random initialization.
  • MC-Dropout-MAP-01-Adam:

      • MC-dropout by minimizing the MAP objective using Adam, p=0.1.
  • MC-Dropout-MAP-02-SGD

      • MC-dropout by minimizing the MAP objective using SGD, p=0.2.
  • MC-Dropout-MAP-02-SGDMOM:

      • MC-dropout by minimizing the MAP objective using SGDMOM, p=0.2.
  • SGLD-256:

      • Implementation of SGLD, trained for 256 times longer than each member of an ensemble.
  • SGLD-64:

      • Implementation of SGLD, trained for 64 times longer than each member of an ensemble..
  • SGHMC-256:

      • Implementation of SGHMC, trained for 256 times longer than each member of an ensemble.
  • SGHMC-64:

      • Implementation of SGHMC, trained for 64 times longer than each member of an ensemble.
  • HMC:

      • Implementation of HMC using Pyro.






Pretrained models