Rethinking Empirical Evaluation of Adversarial Robustness Using First-Order Attack Methods

Kyungmi Lee & Anantha P. Chandrakasan
Department of Electrical Engineering & Computer Science, Massachusetts Institute of Technology
[arXiv]

Summary

Measuring adversarial accuracy against samples generated by first-order attack methods became popular approach to empirically approximate adversarial robustness. Here, we examine bounded first-order attack methods, such as FGSM, R-FGSM, and PGD, on when they fail to find adversarial examples and whether such failure indicates robustness. We identify three cases where attack methods fail due to superficial reasons not implying robustness, thus result in inflated adversarial accuracy:

Zero loss: loss on a sample can be close to zero, resulting in inaccurate computation of gradients due to limitied numerical precision
Non-differentiability: neurons contributing to the final prediction can change between backward-pass for computing gradients and forward-pass for evaluating perturbations when non-differentiable functions, such as ReLU and max pooling, switch modes of neurons
Require more iterations: certain training conditions seem to induce models to be less amenable to first-order approximations; then, iterative attacks using small fixed number of iterations can result in inflated accuracy

We present compensation methods for the above-mentioned three cases, and those methods can be easily combined with existing attack methods for more precise evaluation metric. Compensation methods combined with FGSM, R-FGSM, and PGD are included in ./compensated_attacks, implemented based on AdverTorch. We benchmark how adversarial accuracy changes when compensating for the above-mentioned three cases for CIFAR-10, SVHN, and TinyImageNet datasets. Core codes shared among different experiments are in ./core, and high-level codes with control arguments for experiments are located at this directory. Further explanations and experimental results are presented in the paper.

Dependencies

Before using our code, please check following dependencies:

PyTorch 1.0.1 (with CUDA version 10.0 & cuDNN 7.4)
AdverTorch: clone AdverTorch into this directory after cloning our repo. In case you directly install AdverTorch using pip, fix the relative path to AdverTorch accordingly for ./compensated_attacks.
For TinyImageNet, download the dataset (https://tiny-imagenet.herokuapp.com/) and process such that PyTorch's ImageFolder can load data. Split validation set into half for validation and test set; test set supplied in the dataset itself does not contain labels. Fix the relative path to the folder containing TinyImageNet for tinyimagenet_training.py and tinyimagenet_evaluate_all.py.

kyungmi-lee/eval-adv-robustness

Rethinking Empirical Evaluation of Adversarial Robustness Using First-Order Attack Methods

Summary

Dependencies