/flows_ood

Primary LanguageJupyter Notebook

Why Normalizing Flows Fail to Detect Out-of-Distribution Data

This repository contains experiments for the paper

Why Normalizing Flows Fail to DetectOut-of-Distribution Data

by Polina Kirichenko, Pavel Izmailov and Andrew Gordon Wilson.

Introduction

In the paper we show that the inductive biases of the flows — implicit assumptions in their architecture and training procedure — can hinder OOD detection.

  • We show that flows learn latent representations for images largely based on local pixel correlations, rather than semantic content, making it difficult to detect data with anomalous semantics.
  • We identify mechanisms through which normalizing flows can simultaneously increase likelihood for all structured images.
  • We show that by changing the architectural details of the coupling layers, we can encourage flows to learn transformations specific to the target data, improving OOD detection.
  • We show that OOD detection is improved when flows are trained on high-level features which contain semantic information extracted from image datasets.

In this repository we provide PyTorch code for reproducing results in the paper.

Data Preparation

The following datasets need to be downloaded manually. You can then use the path to the data folder as DATA_PATH in the scripts below.

  • CelebA:
  • NotMNIST: data available here
  • ImageNet 64x64: data available here; we will add ImageNet scripts soon

The other datasets can be automatically downloaded to DATA_PATH when you run the scripts below.

Training RealNVP and Glow models

The scripts for training flow models are in the experiments/train_flows/ folder.

  • train_unsup.py — the standard script for training flows
  • train_unsup_ood.py — same as train_unsup.py, but evaluates the likelihoods on OOD data during training
  • train_unsup_ood_negative.py — same as train_unsup_ood.py, but minimizes likelihood on OOD data (Appendix B)
  • train_unsup_ood_uci.py — same as train_unsup_ood.py, but for tabular data (Appendix K)

Comands used to train baseline models:

# RealNVP on FashionMNIST
python3 train_unsup.py --dataset=[FashionMNIST | MNIST] --data_path=DATA_PATH --save_freq=20 \
  --flow=RealNVP --logdir=LOG_DIR --ckptdir=CKPTS_DIR --num_epochs=81 --lr=5e-5 \
  --prior=Gaussian --num_blocks=6 --batch_size=32

# RealNVP on CelebA
python3 train_unsup.py --dataset=[CelebA | CIFAR10 | SVHN] --data_path=DATA_PATH --logdir=LOG_DIR \
  --ckptdir=CKPTS_DIR --num_epochs=101 --lr=1e-4 --batch_size=32 --num_blocks=8 \
  --weight_decay=5e-5 --num_scales=3

# Glow on FashionMNIST
python3 train_unsup.py --dataset=[FashionMNIST | MNIST] --data_path=DATA_PATH --flow=Glow \
  --logdir=LOG_DIR --ckptdir=CKPTS_DIR --num_epochs=151 --lr=5e-5 --batch_size=32 \
  --optim=RMSprop --num_scales=2 --num_coupling_layers_per_scale=16 \
  --st_type=highway --num_blocks=3 --num_mid_channels=200
  
# Glow on CelebA
python3 train_unsup.py --dataset=[CelebA | CIFAR10 | SVHN] --data_path=DATA_PATH --flow=Glow \
  --logdir=LOG_DIR --ckptdir=CKPTS_DIR --num_epochs=161 --lr=1e-5 --batch_size=32 \
  --optim=RMSprop --num_scales=3 --num_coupling_layers_per_scale=8 \
  --st_type=highway --num_blocks=3 --num_mid_channels=400

Coupling layer and latent space visualizations

We provide example notebooks in experiments/notebooks/:

  • GLOW_fashion.ipynb — Glow for FashionMNIST
  • realnvp_celeba.ipynb — RealNVP for CelebA

Below we show latent representations learned by RealNVP trained on FashionMNIST and CelebA for in-distribution and OOD inputs.

Negative Training

To reproduce the experiments in Appendix B, you can use the script train_unsup_ood_negative.py, e.g. to maximize likelihood on CIFAR-10 and minimize likelihood on CelebA:

train_unsup_ood_negative.py --ood_dataset=CelebA --ood_data_path=OOD_DATA_PATH --dataset=CIFAR10 \
  --data_path=DATA_PATH --logdir=LOG_DIR ckptdir=CKPTS_DIR --num_epochs=101 --lr=5e-5 --batch_size=32 \
  --num_blocks=8 --num_scales=3 --negative_val=-100000 --save_freq=10 --flow=RealNVP

For other dataset pairs, reuse the hyper-parameters of the baseline models and set --negative_val equal to -100000 for CIFAR-10, CelebA and SVHN and to -30000 for FashionMNIST, MNIST.

References

The implementation of RealNVP and Glow was adapted from the repo for the paper Semi-Supervised Learning with Normalizing Flows.