Evaluating and Understanding the Robustness of Adversarial Logit Pairing

The code in this repository, forked from the official implementation, evaluates the robustness of Adversarial Logit Pairing, a proposed defense against adversarial examples.

On the ImageNet 64x64 dataset, with an L-infinity perturbation of 16/255 (the threat model considered in the original paper), we can make the classifier accuracy 0.1% and generate targeted adversarial examples (with randomly chosen target labels) with 98.6% success rate using the provided code and models.

See our writeup here for our analysis, including visualizations of the loss landscape induced by Adversarial Logit Pairing.

Pictures

Obligatory pictures of adversarial examples (with randomly chosen target classes).

Setup

Download and untar the ALP-trained ResNet-v2-50 model into the root of the repository.

RobustML evaluation

Run with:

python robustml_eval.py --imagenet-path <path>