- We express our thanks to Oxford and Imperial College London researchers, who showed how our PI-Dropout can be used for large gains over the state-of-the-art in reinforcement learning in their recent paper at the Beyond tabula rasa in RL workshop at ICLR 2020.
Deep Learning under Privileged Information Using Heteroscedastic Dropout (CVPR 2018, Official Repo)
This is the code for the paper:
Deep Learning Under Privileged Information Using Heteroscedastic Dropout
John Lambert*,
Ozan Sener*,
Silvio Savarese
Presented at CVPR 2018
The paper can be found on ArXiv here.
This repository also includes an implementation for repeatable random data augmentation transformations, useful for transforming images and bounding boxes contained therein identically.
- The DLUPI models used in the paper
- Code for training new feedforward CNN models
- Code for training new feedforward RNN models
If you find this code useful for your research, please cite
@InProceedings{Lambert_2018_CVPR,
author = {Lambert, John and Sener, Ozan and Savarese, Silvio},
title = {Deep Learning Under Privileged Information Using Heteroscedastic Dropout},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}
In this repository we provide:
- Top-k Multi-crop testing framework
- Top-k Single-crop testing framework
- Reproducible (repeatable) random image transformations
- Curriculum learning examples in PyTorch
- Base and derived class examples with virtual functions in Python
We also provide implementations of various baselines that use privileged information, including:
- J. Hoffman, S. Gupta, and T. Darrell. Learning with Side Information through Modality Hallucination. In CVPR, 2016.
- Y. Chen, X. Jin, J. Feng, and S. Yan. Training Group Orthogonal Neural Networks with Privileged Information. In IJCAI, 2017. Pages 1532-1538. https://doi.org/10.24963/ijcai.2017/212.
- H. Yang, J. Zhou, J. Cai, and Y.S. Ong. MIML-FCN+: Multi-Instance Multi-Label Learning via Fully Convolutional Networks With Privileged Information. In CVPR, 2017.
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. In JMLR, 2014. Pages 1929−1958.
- A. Achille, S. Soatto. Information Dropout: learning optimal representations through noisy computation. Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2018.
- K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR, 2015.
All code is implemented in PyTorch.
First install PyTorch, torchvision, and CUDA, then update / install the following packages:
(with Conda and Python 2.7 on Linux the instructions here will look something like)
conda install pytorch torchvision -c pytorch
If you have an NVIDIA GPU, you can accelerate all operations with CUDA.
First install CUDA.
When using CUDA, you can use cuDNN to accelerate convolutions.
First download cuDNN and copy the libraries to /usr/local/cuda/lib64/
.
First,register and create an ImageNet account.
Next, download the 1.28 Million images
Now, we need to download the XML bounding box annotations, either via the link here (42.8 MB in size). or via command line
wget http://image-net.org/Annotation/Annotation.tar.gz
The XML annotations are stored in recursive tar.gz files. They can be recursively unzipped via tar, which will take around 10 minutes on a typical workstation:
mkdir bbox_annotation
tar -xvzf Annotation.tar.gz -C bbox_annotation
rm Annotation.tar.gz
cd bbox_annotation
for a in `ls -1 *.tar.gz`; do gzip -dc $a | tar xf -; done
rm *.tar.gz
Now, we have a directory called bbox_annotation/Annotation
that contains .xml files with bounding box information for 3,627 classes ("synsets") of ImageNet. We will use only the 1000 classes featured in the ImageNet Large-Scale Visual Recognitiion Challenge (ILSVRC) task.
At this point, we'll arrange the image data into three folders: "train", "val", and "test".
6.3G val.zip
56G train.zip
On the ILSVRC 2016 page on the ImageNet website, find and download the file named
ILSVRC2016_CLS-LOC.tar.gz
This is the Classification-Localization dataset (155GB),unchanged since ILSVRC2012. There are a total of 1,281,167 images for training. The number of images for each synset (category) ranges from 732 to 1300. There are 50,000 validation images, with 50 images per synset. There are 100,000 test images. All images are in JPEG format.
It is arranged as follows: {split}/{synset_name}/{file_name}.JPEG
For example, ImageNet_2012/train/n02500267/02500267_2597.JPEG
We will use the bounding box subset of the images from CLS-LOC (that have bounding box information). We'll then use subsets of the images with annotated bounding boxes to evaluate sample efficiency. Run:
mkdir ImageNetLocalization
python cnns/imagenet/create_bbox_dataset.py
python cnns/imagenet/create_imagenet_test_set.py
The script train.py
lets you train a new CNN model from scratch.
python cnns/train/train.py
By default this script runs on GPU; to run on CPU, remove the .cuda() lines within the code.
Free for personal or research use; for commercial use please contact me.
File explaining some of the model names: https://docs.google.com/document/d/1KBjYK52Jvcd8cYpIZPPRUrXGSu6jFsBL5O3FwvtBr_Q/edit?usp=sharing
- X* 75K, ADAPTIVE DECAY, Model link
{ 'model_type' : ModelType.DROPOUT_FN_OF_XSTAR,
'model_fpath': '/vision/group/ImageNetLocalization/saved_imagenet_models/ImageNet_Localization_1000_Class_75perclass_identity_ATtest_lambda100_BS64'
},
- Modal. Hallucination shared params RGB MASK, Model link
{ 'model_type' : ModelType.MODALITY_HALLUC_SHARED_PARAMS,
'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_25_05_46_34_num_ex_per_cls_75_bs_128_optimizer_type_sgd_model_type_ModelType.MODALITY_HALLUC_SHARED_PARAMS_lr_0.01_fixlrsched_False'},
- MIML-FCN/VGG RGB MASK Model link
{ 'model_type' : ModelType.MIML_FCN_VGG,
'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_25_05_31_49_num_ex_per_cls_75_bs_128_optimizer_type_sgd_model_type_ModelType.MIML_FCN_VGG_lr_0.01_fixlrsched_False'},
- MIML-FCN [40]/ResNet RGB MASK Model link
{ 'model_type' : ModelType.MIML_FCN_RESNET,
'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_25_02_59_54_num_ex_per_cls_75_bs_256_optimizer_type_sgd_model_type_ModelType.MIML_FCN_RESNET_lr_0.1_fixlrsched_False'},
- GoCNN, VGG, scale coeff down by 320 Model link
{ 'model_type' : ModelType.GO_CNN_VGG,
'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_24_04_20_02_num_ex_per_cls_75_bs_256_optimizer_type_adam_model_type_ModelType.GO_CNN_VGG_lr_0.001_fixlrsched_False'},
- Random Gaussian Dropout Model link
{ 'model_type' : ModelType.DROPOUT_RANDOM_GAUSSIAN_NOISE,
'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_22_07_28_56_num_ex_per_cls_75_bs_256_optimizer_type_sgd_model_type_ModelType.DROPOUT_RANDOM_GAUSSIAN_NOISE_lr_0.01_fixlrsched_False'},
- NO X* 75k adaptive decay, bs = 256 , Model link
{ 'model_type' : ModelType.DROPOUT_BERNOULLI,
'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_07_13_20_12_num_ex_per_cls_75_bs_256_optimizer_type_sgd_dropout_type_bernoulli_lr_0.01_fixlrsched_False'},