/anomaly_detection

This is the official implementation of "Anomaly Detection with Deep Perceptual Autoencoders".

Primary LanguagePythonApache License 2.0Apache-2.0

Anomaly Detection with Deep Perceptual Autoencoders — Pytorch Implementation

License

Anomaly Detection with Deep Perceptual Autoencoders
Nina Tuluptceva, Bart Bakker, Irina Fedulova, Heinrich Schulz, and Dmitry V. Dylov.
2020
https://arxiv.org/abs/2006.13265

This is the official implementation of "Anomaly Detection with Deep Perceptual Autoencoders". It includes experiments reported in the paper.

Structure of Project

anomaly_detection - python package; implementations of 
                                deep_geo: Deep Anomaly Detection Using Geometric Transformations  (https://arxiv.org/abs/1805.10917)
                                deep_if: Towards Practical Unsupervised Anomaly Detection on Retinal Images (https://link.springer.com/chapter/10.1007/978-3-030-33391-1_26)
                                piad: Perceptual Image Anomaly Detection (https://arxiv.org/abs/1909.05904)
                                dpa: Anomaly Detection with Deep Perceptual Autoencoders (https://arxiv.org/abs/2006.13265).    
configs - yaml configs to reproduce experiments, reported in the paper
    └───deep_geo - configs used to train and eval deep_geo models
    |   |   train_example.yaml -- Example of train config of deep_geo model with a description of all params
    |   |   eval_example.yaml -- -- Example of a config for evaluation of deep_geo model with a description of all params
    │   └───camelyon16
    |   |   └───meta - "meta" configs with missed values (to generate configs for hyperparameter search)
    |   |   │   |   ...
    |   |   └───final - configs used in final experiments
    |   |   │   └───reproduce -- configs with default hyperparameters provided by the authors of the method (to reproduce the papers' results)
    |   |   │   │    ... 
    |   |   │   └───with_cv -- configs with hyperparameters found by cross-validation search (see paper for more detail)
    |   |   │   │    ...
    │   └───cifar10
    |   |    ...
    │   └───nih 
    |   |    ...
    │   └───svhn
    |   |    ...
    └───deep_if - configs used to train and eval deep_if models
    |   ...
    └───dpa - configs used to train dpa models
    |   ...
    └───piad - configs used to train piad models
    |   ...
 camelyon16_preprocessing -- Scripts for preprocessing Camelyon16 dataset
 folds -- Folds used in cross-validation, train/test split of NIH and Camelyon16, validation info 

Installation

Requirements: Python3.6

You can install miniconda environment(version 4.5.4):

wget https://repo.anaconda.com/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
bash Miniconda3-4.5.4-Linux-x86_64.sh
export PATH="{miniconda_root/bin}:$PATH

Installation:

pip install -r requirements.txt
pip install -e .

Training and Evaluation

The paper includes experiments on CIFAR10, SVHN, Camelyon16, and NIH datasets.

To get started with CIFAR10 and SVHN, data downloading is NOT required (we used torchvision.datasets implementation of these datasets).

To work with Camelyon16, and NIH datasets, see section Data Preprocessing.

For each model:

there are main.py scripts in corresponding directory in anomaly detection/{deep_geo,deep_if/piad,dpa}

and examples of train/evaluate configs in corresponding files in configs/{deep_geo,deep_if/piad,dpa}/{train_example/eval_example}.yaml See the configs for more details.

Try:

python anomaly_detection/dpa/main.py train_eval configs/dpa/train_wo_pg_example.yaml configs/dpa/eval_wo_pg_example.yaml

To reproduce all experiments of the paper, run:

python run_experiments.py

Or specify a subset:

python run_experiments.py --model dpa deep_if --datasets camelyon16 --ablation

Hyperparameter Tuning (cross-validation)

Cross-validation folds used in the paper are stored in ./folds/folds/. Information about classes and images used for validation is in ./folds/validation_classes/. Train/test split for Camelyon16 and NIH (AP, PA, a subset) dataset is in ./folds/train_test_split/.

To generate cross-validation folds by yourself, use scripts from the folder anomaly_detection/utils/preprocessing/create_folds/.

For example:

python anomaly_detection/utils/preprocessing/create_folds/cifar10.py -o ./my_folds/folds -n 3

Data Preprocessing

Camelyon16

Camelyon16 is a challenge conducted in 2016 of automated detection of metastases in hematoxylin and eosin (H&E) stained whole-slide images of lymph node sections.

See Offical Challenge Website for more details.

Preprocessing steps:

  1. Download data of camelyon16 challenge link, store it, for example, in ./data/data/camelyon16_original directory

    ./data/data/camelyon16_original
    │   ...
    └───training
    │   │   lesion_annotations.zip (111 xml)
    │   │───normal (159 tif)
    │       │   normal_001.tif
    │       │   ...
    │   │───tumor (111 tif)
    │       │   tumor_001.tif
    │       │   ...
    └───testing
    │   │   lesion_annotations.zip (48 xml)
    │   │   reference.csv (129 csv)
    │   │───images (129 tif)
    │       │   test_001.tif
    │       │   ...
    │   ...
    
  2. Unzip both lesion_annotations.zip files

  3. Build and run docker using, see camelyon16_preprocessing (put correct paths to camelyon16_preprocessing/docker/run.sh). Or install openslide into your system.

    cd camelyon16_preprocessing/docker
    bash build.sh
    bash run.sh
  4. Perform preprocessing:

    /opt/anaconda/bin/python /scripts/1_convert_annotation_to_json.py
    /opt/anaconda/bin/python /scripts/2_create_tumor_masks.py
    /opt/anaconda/bin/python /scripts/3_generate_normal_patches_x40.py
    /opt/anaconda/bin/python /scripts/4_generate_tumor_patches_x40.py
    /opt/anaconda/bin/python /scripts/5_select_stain_normalization_image.py
    /opt/anaconda/bin/python /scripts/6_normalize_stain.py
    /opt/anaconda/bin/python /scripts/7_create_train_test_split.py
    /opt/anaconda/bin/python /scripts/8_merge_images.py
    /opt/anaconda/bin/python /scripts/9_resize_to_x20_and_x10.py
    1. Convert the xml-annotation files into json-format
    2. Create masks for tumor images (from json annotations)
    3. Generate normal patches (with the level of magnification x40) from the train split and the test split
    4. Generate tumor patches (with the level of magnification x40) from the train split and the test split
    5. Save crop from a source image as the "target" of stain normalization
    6. Perform stain normalization of all patches using script normalize_stain.py
    7. Create a train/test split (just create lists of the generated patches)
    8. Move all patches into one folder
    9. Create resized copies of patches with level of magnification x20 and x10

NIH

  1. Download NIH data link
    NIH
    │   ... 
    |   Data_Entry_2017.csv
    │	train_val_list.txt
    |   test_list.txt
    └───images
    │   │   batch_downlaad_zips.py 
    │   │   images_001.tar.gz
    |   |   images_002.tar.gz
    |   |   ...
    
  2. Unzip all images (save it, for example, in ./data/data/nih/ folder)
  3. Resize images to the resolution 300x300 (for faster loading)
    python anomaly_detection/utils/preprocessing/nih_resize.py
  4. Create a train/test split (just filter train/test lists for each view: AP, PA)
    python anomaly_detection/utils/preprocessing/create_folds/cifar10.py [-h] -i CIFAR10_ROOT -o OUTPUT_ROOT [-n N_FOLDS]