This is the code for the ICML'19 paper "Theoretically Principled Trade-off between Robustness and Accuracy" by Hongyang Zhang (CMU), Yaodong Yu (University of Virginia), Jiantao Jiao (UC Berkeley), Eric P. Xing (CMU & Petuum Inc.), Laurent El Ghaoui (UC Berkeley), and Michael I. Jordan (UC Berkeley).
The methodology is the first-place winner of the NeurIPS 2018 Adversarial Vision Challenge (Robust Model Track).
The attack method transferred from TRADES robust model is the first-place winner of the NeurIPS 2018 Adversarial Vision Challenge (Targeted Attack Track).
- Python (3.6.4)
- Pytorch (0.4.1)
- CUDA
- numpy
We suggest to install the dependencies using Anaconda or Miniconda. Here is an exemplary command:
$ wget https://repo.anaconda.com/archive/Anaconda3-5.1.0-Linux-x86_64.sh
$ bash Anaconda3-5.1.0-Linux-x86_64.sh
$ source ~/.bashrc
$ conda install pytorch=0.4.1
TRADES minimizes a regularized surrogate loss L(.,.) (e.g., the cross-entropy loss) for adversarial training:
Important: the surrogate loss L(.,.) in the second term should be classification-calibrated according to our theory, in contrast to the L2 loss used in Adversarial Logit Pairing.
The first term encourages the natural error to be optimized by minimizing the "difference" between f(X) and Y , while the second regularization term encourages the output to be smooth, that is, it pushes the decision boundary of classifier away from the sample instances via minimizing the "difference" between the prediction of natural example f(X) and that of adversarial example f(X′). The tuning parameter β plays a critical role on balancing the importance of natural and robust errors.
Left figure: decision boundary by natural training. Right figure: decision boundary by TRADES.
def train(args, model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
loss = F.cross_entropy(model(data), target)
loss.backward()
optimizer.step()
To apply TRADES, cd into the directory, put 'trades.py' to the directory. Replace F.cross_entropy()
above with trades_loss()
:
from trades import trades_loss
def train(args, model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
# calculate robust loss - TRADES loss
loss = trades_loss(model=model,
x_natural=data,
y=target,
optimizer=optimizer,
step_size=args.step_size,
epsilon=args.epsilon,
perturb_steps=args.num_steps,
beta=args.beta,
distance='l_inf')
loss.backward()
optimizer.step()
step_size
: step size for perturbationepsilon
: limit on the perturbation sizenum_steps
: number of perturbation iterations for projected gradient descent (PGD)beta
: trade-off regularization parameterdistance
: type of perturbation distance,'l_inf'
or'l_2'
The trade-off regularization parameter beta
can be set in [1, 10]
. Larger beta
leads to more robust and less accurate models.
python mnist_example_trades.py
We adapt main.py
in [link] to our new loss trades_loss()
during training.
- Train WideResNet-34-10 model on CIFAR10:
$ python train_trades_cifar10.py
- Train CNN model (four convolutional layers + three fully-connected layers) on MNIST:
$ python train_trades_mnist.py
- Train CNN model (two convolutional layers + two fully-connected layers) on MNIST (digits '1' and '3') for binary classification problem:
$ python train_trades_mnist_binary.py
- Evaluate robust WideResNet-34-10 model on CIFAR10 by FGSM-20 attack:
$ python pgd_attack_cifar10.py
- Evaluate robust CNN model on MNIST by FGSM-40 attack:
$ python pgd_attack_mnist.py
Results in the NeurIPS 2018 Adversarial Vision Challenge [link]
TRADES won the 1st place out of 1,995 submissions in the NeurIPS 2018 Adversarial Vision Challenge (Robust Model Track) on the Tiny ImageNet dataset, surpassing the runner-up approach by 11.41% in terms of L2 perturbation distance.
Top-6 results (out of 1,995 submissions) in the NeurIPS 2018 Adversarial Vision Challenge (Robust Model Track). The vertical axis represents the mean L2 perturbation distance that makes robust models fail to output correct labels.
Results in the Unrestricted Adversarial Examples Challenge [link]
In response to the Unrestricted Adversarial Examples Challenge, we implement TRADESv2 (a variant of TRADES with extra spatial-transformation-invariant considerations) on the bird-or-bicycle dataset.
All percentages below correspond to the model's accuracy at 80% coverage.
Defense | Submitted by | Clean data | Common corruptions | Spatial grid attack | SPSA attack | Boundary attack | Submission Date |
---|---|---|---|---|---|---|---|
Pytorch ResNet50 (trained on bird-or-bicycle extras) |
TRADESv2 | 100.0% | 100.0% | 99.5% | 100.0% | 95.0% | Jan 17th, 2019 (EST) |
Keras ResNet (trained on ImageNet) |
Google Brain | 100.0% | 99.2% | 92.2% | 1.6% | 4.0% | Sept 29th, 2018 |
Pytorch ResNet (trained on bird-or-bicycle extras) |
Google Brain | 98.8% | 74.6% | 49.5% | 2.5% | 8.0% | Oct 1st, 2018 |
TRADES is a new baseline method for adversarial defenses. We welcome various attack methods to attack our defense models. We provide checkpoints of our robust models on MNIST dataset and CIFAR dataset. On both datasets, we normalize all the images to [0, 1]
.
cd TRADES
mkdir checkpoints
cd checkpoints
wget http://people.virginia.edu/~yy8ms/TRADES/model_mnist_smallcnn.pt
wget http://people.virginia.edu/~yy8ms/TRADES/model_cifar_wrn.pt
cd TRADES
mkdir data_attack
cd data_attack
wget http://people.virginia.edu/~yy8ms/TRADES/cifar10_X.npy
wget http://people.virginia.edu/~yy8ms/TRADES/cifar10_Y.npy
wget http://people.virginia.edu/~yy8ms/TRADES/mnist_X.npy
wget http://people.virginia.edu/~yy8ms/TRADES/mnist_Y.npy
cd TRADES
mkdir data_attack
cd data_attack
wget http://people.virginia.edu/~yy8ms/TRADES/cifar10_X.npy
wget http://people.virginia.edu/~yy8ms/TRADES/cifar10_Y.npy
wget http://people.virginia.edu/~yy8ms/TRADES/mnist_X.npy
wget http://people.virginia.edu/~yy8ms/TRADES/mnist_Y.npy
All the images in both datasets are normalized to [0, 1]
.
cifar10_X.npy
-- a(10,000, 32, 32, 3)
numpy arraycifar10_Y.npy
-- a(10,000, )
numpy arraymnist_X.npy
-- a(10,000, 28, 28)
numpy arraymnist_Y.npy
-- a(10,000, )
numpy array
from models.small_cnn import SmallCNN
device = torch.device("cuda")
model = SmallCNN().to(device)
model.load_state_dict(torch.load('./checkpoints/model_mnist_smallcnn.pt'))
For our model model_mnist_smallcnn.pt
, the limit on the perturbation size is epsilon=0.3
(L_infinity perturbation distance).
Attack | Submitted by | Natural Accuracy | Robust Accuracy |
---|---|---|---|
FGSM-1,000 | (initial entry) | 99.48% | 95.60% |
FGSM-40 | (initial entry) | 99.48% | 96.07% |
- Step 1: Download
mnist_X.npy
andmnist_Y.npy
. - Step 2: Run your own attack on
mnist_X.npy
and save your adversarial images asmnist_X_adv.npy
. - Step 3: put
mnist_X_adv.npy
under./data_attack
. - Step 4: run the evaluation code,
$ python evaluate_attack_mnist.py
Note that the adversarial images should in [0, 1]
and the largest perturbation distance is epsilon = 0.3
(L_infinity).
from models.wideresnet import WideResNet
device = torch.device("cuda")
model = WideResNet().to(device)
model.load_state_dict(torch.load('./checkpoints/model_cifar_wrn.pt'))
For our model model_cifar_wrn.pt
, the limit on the perturbation size is epsilon=0.031
(L_infinity perturbation distance).
Attack | Submitted by | Natural Accuracy | Robust Accuracy |
---|---|---|---|
FGSM-1,000 | (initial entry) | 84.92% | 56.43% |
FGSM-20 | (initial entry) | 84.92% | 56.61% |
DeepFool (L_inf) | (initial entry) | 84.92% | 61.38% |
DeepFool (L_2) | (initial entry) | 84.92% | 81.55% |
LBFGSAttack | (initial entry) | 84.92% | 81.58% |
MI-FGSM | (initial entry) | 84.92% | 57.95% |
CW | (initial entry) | 84.92% | 81.24% |
FGSM | (initial entry) | 84.92% | 61.06% |
- Step 1: Download
cifar10_X.npy
andcifar10_Y.npy
. - Step 2: Run your own attack on
cifar10_X.npy
and save your adversarial images ascifar10_X_adv.npy
. - Step 3: put
cifar10_X_adv.npy
under./data_attack
. - Step 4: run the evaluation code,
$ python evaluate_attack_cifar10.py
Note that the adversarial images should in [0, 1]
and the largest perturbation distance is epsilon = 0.031
(L_infinity).
For technical details and full experimental results, please check the paper.
@article{Zhang2019theoretically,
author = {Hongyang Zhang and Yaodong Yu and Jiantao Jiao and Eric P. Xing and Laurent El Ghaoui and Michael I. Jordan},
title = {Theoretically Principled Trade-off between Robustness and Accuracy},
journal = {arXiv preprint arXiv:1901.08573},
year = {2019}
}
Please contact yy8ms@virginia.edu and hongyanz@cs.cmu.edu if you have any question on the codes. Enjoy!