/Friendly-Adversarial-Training

ICML'20: Attacks Which Do Not Kill Training Make Adversarial Learning Stronger

Primary LanguagePython

Friendly Adversarial Training Code

This repository provides codes for friendly adversarial training (FAT).

ICML 2020 Paper: Attacks Which Do Not Kill Training Make Adversarial Learning Stronger (https://arxiv.org/abs/2002.11242) Jingfeng Zhang*, Xilie Xu*, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama and Mohan Kankanhalli

What is the nature of the adversarial training?

Adversarial data can easily fool the standard trained classifier. Adversarial training employs the adversarial data into the training process. Adversarial training aims to achieve two purposes (a) correctly classify the data, and (b) make the decision boundary thick so that no data fall inside the decision boundary.

The purposes of the adversarial training

Conventional formulation of the adversarial training

Conventional adversarial training is based on the minimax formulation:

where

Inside, there is maximization where we find the most adversarial data. Outside, there is minimization where we find a classifier to fit those generated adversarial data.

The minimax formulation is pessimistic.

The minimax-based adversarial training causes the severe degradation of the natural generalization. Why? The minimax-based adversarial training has a severe cross-over mixture problem: the adversarial data of different classes overshoot into the peer areas. Learning from those adversarial data is very difficult.

Cross-over mixture problem of the minimax-based adversarial training

Our min-min formulation for the adversarial training.

The outer minimization keeps the same. Instead of generating adversarial data via the inner maximization, we generate the friendly adversarial data minimizing the loss value. There are two constraints (a) the adversarial data is misclassified, and (b) the wrong prediction of the adversarial data is better than the desired prediction by at least a margin

Let us look at comparisons between minimax formulation and min-min formulation.

Comparisons between minimax formulation and min-min formulation

A Realization of the Min-min Formulation --- Friendly Adversarial Training (FAT)

Friendly adversarial training (FAT) employs the friendly adversarial data generated by early stopped PGD to update the model. The early stopped PGD stop the PGD interations once the adversarial data is misclassified. (Controlled by the hyperparameter tau in the code. Noted that when tau equal to maximum perturbation step num_steps, our FAT makes the conventional adversarial training e.g., AT, TRADES, and MART as our special cases.)

Conventional adversarial training employs PGD for searching most adversarial data. Friendly adversarial training employs early stopped PGD for searching friendly adversarial data.

Preferred Prerequisites

  • Python (3.6)
  • Pytorch (1.2.0)
  • CUDA
  • numpy

Running FAT, FAT for TRADES, FAT for MART on benchmark datasets (CIFAR-10 and SVHN)

Here are examples:

  • Train WRN-32-10 model on CIFAR-10 and compare our results with AT, CAT and DAT:
CUDA_VISIBLE_DEVICES='0' python FAT.py --epsilon 0.031
CUDA_VISIBLE_DEVICES='0' python FAT.py --epsilon 0.062

White-box evaluations on WRN-32-10

Defense Natural Acc. FGSM Acc. PGD-20 Acc. C&W Acc.
AT(Madry) 87.30% 56.10% 45.80% 46.80%
CAT 77.43% 57.17% 46.06% 42.28%
DAT 85.03% 63.53% 48.70% 47.27%
FAT () 89.340.221% 65.520.355% 46.130.049% 46.820.517%
FAT () 87.000.203% 65.940.244% 49.860.328% 48.650.176%

Results of AT(Madry), CAT and DAT are reported in DAT. FAT has the same evaluations.

  • Train WRN-34-10 model on CIFAR-10 and compare our results with TRADES, and MART.
CUDA_VISIBLE_DEVICES='0' python FAT_for_TRADES.py --epsilon 0.031
CUDA_VISIBLE_DEVICES='0' python FAT_for_TRADES.py --epsilon 0.062
CUDA_VISIBLE_DEVICES='0' python FAT_for_MART.py --epsilon 0.031
CUDA_VISIBLE_DEVICES='0' python FAT_for_MART.py --epsilon 0.062

White-box evaluations on WRN-34-10

Defense Natural Acc. FGSM Acc. PGD-20 Acc. C&W Acc.
TRADES() 88.64% 56.38% 49.14% -
FAT for TRADES() 89.940.303% 61.000.418% 49.700.653% 49.350.363%
TRADES() 84.92% 61.06% 56.61% 54.47%
FAT for TRADES() 86.600.548% 61.790.570% 55.980.209% 54.290.173%
FAT for TRADES() 84.390.030% 61.730.131% 57.120.233% 54.360.177%

Results of TRADES ( and ) are reported in TRADES. FAT for TRADES has the same evaluations. Noted that our evaluations of the above are the same as the description in the TRADES's paper, i.e., adversarial data are generated without random start rand_init=False. However, in TRADES’s GitHub, they use random start rand_init=True before PGD perturbation that is deviated from the statements in their paper. For the fair evaluations of FAT with random start, please refer to the Table 3 in our paper.

Want to attack FAT? Sure!

We welcome various attack methods to attack our defense models. On both cifar-10 and SVHN dataset, we normalize all images into [0,1].

We will upload the trained model later!

Reference

@article{zhang2020fat,
  title={Attacks Which Do Not Kill Training Make Adversarial Learning Stronger},
  author={Zhang, Jingfeng and Xu, Xilie and Han, Bo and Niu, Gang and Cui, Lizhen and Sugiyama, Masashi and Kankanhalli, Mohan},
  journal={arXiv preprint arXiv:2002.11242},
  year={2020}
}

Contact

Please contact j-zhang@comp.nus.edu.sg and xuxilie@mail.sdu.edu.cn if you have any question on the codes.