Related papers for robust machine learning (we mainly focus on defenses).
Since there are tens of new papers on adversarial defense in each conference, we are only able to update those we just read and consider as insightful.
Anyone is welcomed to submit a pull request for the related and unlisted papers on adversarial defense, which are pulished on peer-review conferences (ICML/NeurIPS/ICLR/CVPR etc.) or released on arXiv.
- General Defenses (training phase)
- General Defenses (inference phase)
- Adversarial Detection
- Certified Defense and Model Verification
- Theoretical Analysis
- Empirical Analysis
- Beyond Safety (Adversarial for Good)
- Seminal Work
- Benchmark Datasets
-
Better Diffusion Models Further Improve Adversarial Training (ICML 2023)
This paper advocate that better diffusion models such as EDM can further improve adversarial training beyond using DDPM, which achieves new state-of-the-art performance on CIFAR-10/100 as listed on RobustBench. -
FrequencyLowCut Pooling -- Plug & Play against Catastrophic Overfitting (ECCV 2022)
This paper proposes a novel aliasing-free downsampling layer to prevent catastrophic overfitting during simple Fast Gradient Sign Method (FGSM) adversarial training. -
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition (ICML 2022)
This paper advocate that robustness and accuracy are not at odds, as long as we slightly modify the definition of robust error. Efficient ways of optimizating the new SCORE objective is provided. -
Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks (NeurIPS 2021)
This paper combines the stable conditions in control theory into neural ODE to induce locally stable models. -
Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart (CVPR 2022)
This paper proposes a coupling rejection strategy, where two simple but well-designed rejection metrics can be coupled to provabably distinguish any misclassified sample from correclty classified ones. -
Fixing Data Augmentation to Improve Adversarial Robustness (NeurIPS 2021)
This paper shows that after applying weight moving average, data augmentation (either by transformatons or generative models) can further improve robustness of adversarial training. -
Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness? (ICLR 2022)
This paper verifies that leveraging more data sampled from a (high-quality) generative model that was trained on the same dataset (e.g., CIFAR-10) can still improve robustness of adversarially trained models, without using any extra data. -
Towards Robust Neural Networks via Close-loop Control (ICLR 2021)
This paper introduce a close-loop control framework to enhance adversarial robustness of trained networks. -
Understanding and Improving Fast Adversarial Training (NeurIPS 2020)
A systematic study of catastrophic overfitting in adversarial training, its reasons, and ways of resolving it. The proposed regularizer, GradAlign, helps to prevent catastrophic overfitting and scale FGSM training to high Linf-perturbations. -
Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks (ICML 2020)
This paper uses a perturbation-dependent label smoothing method to generalize adversarially trained models to unseen attacks. -
Smooth Adversarial Training
This paper advocate using smooth variants of ReLU during adversarial training, which can achieve state-of-the-art performance on ImageNet. -
Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness (ICLR 2020)
This paper rethink the drawbacks of softmax cross-entropy in the adversarial setting, and propose the MMC method to induce high-density regions in the feature space. -
Jacobian Adversarially Regularized Networks for Robustness (ICLR 2020)
This paper propose to show that a generally more interpretable model could potentially be more robust against adversarial attacks. -
Fast is better than free: Revisiting adversarial training (ICLR 2020)
This paper proposes several tricks to make FGSM-based adversarial training effective. -
Adversarial Training and Provable Defenses: Bridging the Gap (ICLR 2020)
This paper proposes the layerwise adversarial training method, which gradually optimizes on the latent adversarial examples from low-level to high-level layers. -
Improving Adversarial Robustness Requires Revisiting Misclassified Examples (ICLR 2020)
This paper proposes a new method MART, which involves a boosted CE loss to further lower down the second-maximal prediction, and a weighted KL term (similar as a focal loss), compared to the formula of TRADES. -
Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness
This paper introduces the mixup method into adversarial training to improve the model performance on clean images. -
Are labels required for improving adversarial robustness? (NeurIPS 2019)
This paper exploit unlabeled data to better improve adversarial robustness. -
Adversarial Robustness through Local Linearization (NeurIPS 2019)
This paper introduce local linearization in adversarial training process. -
Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks (NeurIPS 2019)
A method to efficiently certify the robustness of GBDTs and to integrate the certificate into training (leads to an upper bound on the worst-case loss). The obtained certified accuracy is higher than for other robust GBDTs and is competitive to provably robust CNNs. -
You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle (NeurIPS 2019)
This paper provides a fast method for adversarial training from the perspective of optimal control. -
Adversarial Training for Free! (NeurIPS 2019)
A fast method for adversarial training, which shares the back-propogation gradients of updating weighs and crafting adversarial examples. -
ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation (ICML 2019)
This paper demonstrates the global low-rank structures within images, and leverages matrix estimation to exploit such underlying structures for better adversarial robustness. -
Using Pre-Training Can Improve Model Robustness and Uncertainty (ICML 2019)
This paper shows adversarial robustness can transfer and that adversarial pretraining can increase adversarial robustness by ~10% accuracy. -
Theoretically Principled Trade-off between Robustness and Accuracy (ICML 2019)
A variant of adversarial training: TRADES, which won the defense track of NeurIPS 2018 Adversarial Competation. -
Robust Decision Trees Against Adversarial Examples (ICML 2019)
A method to enhance the robustness of tree models, including GBDTs. -
Improving Adversarial Robustness via Promoting Ensemble Diversity (ICML 2019)
Previous work constructs ensemble defenses by individually enhancing each memeber and then directly average the predictions. In this work, the authors propose the adaptive diversity promoting (ADP) to further improve the robustness by promoting the ensemble diveristy, as an orthogonal methods compared to other defenses. -
Feature Denoising for Improving Adversarial Robustness (CVPR 2019)
This paper applies non-local neural network and large-scale adversarial training with 128 GPUs (with training trick in 'Accurate, large minibatch SGD: Training ImageNet in 1 hour'), which shows large improvement than previous SOTA trained with 50 GPUs. -
Improving the Generalization of Adversarial Training with Domain Adaptation (ICLR 2019)
This work proposes to use additional regularization terms to match the domains between clean and adversarial logits in adversarial training. -
A Spectral View of Adversarially Robust Features (NeurIPS 2018)
Given the entire dataset X, use the eigenvectors of spectral graph as robust features. [Appendix] -
Adversarial Logit Pairing
Adversarial training by pairing the clean and adversarial logits. -
Deep Defense: Training DNNs with Improved Adversarial Robustness (NeurIPS 2018)
They follow the linear assumption in DeepFool method. DeepDefense pushes decision boundary away from those correctly classified, and pull decision boundary closer to those misclassified. -
Max-Mahalanobis Linear Discriminant Analysis Networks (ICML 2018)
This is one of our work. We explicitly model the feature distribution as a Max-Mahalanobis distribution (MMD), which has max margin among classes and can lead to guaranteed robustness. -
Ensemble Adversarial Training- Attacks and Defenses (ICLR 2018)
Ensemble adversarial training use sevel pre-trained models, and in each training batch, they randomly select one of the currently trained model or pre-trained models to craft adversarial examples. -
Pixeldefend: Leveraging generative models to understand and defend against adversarial examples (ICLR 2018)
This paper provided defense by moving adversarial examples back towards the distribution seen in the training data.
-
Adversarial Attacks are Reversible with Natural Supervision (ICCV 2021)
This paper proposes to use contrastive loss to restore the natural structure of attacked images, providing a defense. -
Adversarial Purification with Score-based Generative Models (ICML 2021)
This paper proposes to use score-based generative models (e.g., NCSN) to purify adversarial examples. -
Online Adversarial Purification based on Self-Supervision (ICLR 2021)
This paper proposes to train the network with a label-independent auxiliary task (e.g., rotation prediction), and purify the test inputs dynamically by minimizing the auxiliary loss. -
Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks (ICLR 2020)
This paper exploit the mixup mechanism in the inference phase to improve robustness. -
Barrage of Random Transforms for Adversarially Robust Defense (CVPR 2019)
This paper applies a set of different random transformations as an off-the-shelf defense. -
Mitigating Adversarial Effects Through Randomization (ICLR 2018)
Use random resizing and random padding to disturb adversarial examples, which won the 2nd place in th defense track of NeurIPS 2017 Adversarial Competation. -
Countering Adversarial Images Using Input Transformations (ICLR 2018)
Apply bit-depth reduction, JPEG compression, total variance minimization and image quilting as input preprocessing to defend adversarial attacks.
-
Detecting adversarial examples is (nearly) as hard as classifying them (ICML 2022)
This paper demonstrates that detection and classification of adversarial examples can be mutually converted, and thus many previous works on detection may overclaim their effectiveness. -
Class-Disentanglement and Applications in Adversarial Detection and Defense (NeurIPS 2021)
This paper proposes to disentangle the class-dependence and visually reconstruction, and exploit the result as an adversarial detection metric. -
Towards Robust Detection of Adversarial Examples (NeurIPS 2018)
This is one of our work. We train the networks with reverse cross-entropy (RCE), which can map normal features to low-dimensional manifolds, and then detectors can better separate between adversarial examples and normal ones. -
A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks (NeurIPS 2018)
Fit a GDA on learned features, and use Mahalanobis distance as the detection metric. -
Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks (NeurIPS 2018)
They fit a GMM on learned features, and use the probability as the detection metric. -
Detecting adversarial samples from artifacts
This paper proposed the kernel density (K-density) metric on the learned features to detect adversarial examples.
-
Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples (NeurIPS 2021)
This paper generally study the effciency of different certified defenses, and find that the smoothness of loss landscape matters. -
Towards Verifying Robustness of Neural Networks against Semantic Perturbations (CVPR 2020)
This paper generalize the pixel-wise verification methods into the semantic transformation space. -
Neural Network Branching for Neural Network Verification (ICLR 2020)
This paper use GNN to adaptively construct branching strategy for model verification. -
Towards Stable and Efficient Training of Verifiably Robust Neural Networks (ICLR 2020)
This paper combines the previous IBP and CROWN methods. -
A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks (NeurIPS 2019)
This paper makes a conprehensive studies on existing robustness verification methods based on convex relaxation. -
Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers (NeurIPS 2019)
This word extends the robustness certificate of random smoothing from L2 to L0 norm bound. -
On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models (ICCV 2019)
This paper proposes the scalable verificatin method with interval bound propagation (IBP). -
Evaluating Robustness of Neural Networks with Mixed Integer Programming (ICLR 2019)
This paper use mixed integer programming (MIP) method to solve the verification problem. -
Efficient Neural Network Robustness Certification with General Activation Functions (NeurIPS 2018)
This paper proposes the verification method CROWN for general activation with locally linear or quadratic approximation. -
A Unified View of Piecewise Linear Neural Network Verification (NeurIPS 2018)
This paper presents a unified framework and an empirical benchmark on previous verification methods -
Scaling Provable Adversarial Defenses (NeurIPS 2018)
They add three tricks to improve the scalability (to CIFAR-10) of previously proposed method in ICML. -
Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope (ICML 2018)
By robust optimization (via a linear program), they can get a point-wise bound of robustness, where no adversarial example exists in the bound. Experiments are done on MNIST. -
Towards Fast Computation of Certified Robustness for ReLU Networks (ICML 2018)
This paper proposes the Fast-Lin and Fast-Lip methods. -
Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach (ICLR 2018)
This paper proposes the CLEVER method to estimate the upper bound of specification. -
Certified Defenses against Adversarial Examples (ICLR 2018)
This paper proposes the certified training with semidefinite relaxation. -
A Dual Approach to Scalable Verification of Deep Networks (UAI 2018)
This paper solves the dual problem to provide an upper bound of the primary specification problem for verification. -
Reluplex: An efficient SMT solver for verifying deep neural networks (CAV 2017)
This paper use satisfiability modulo theory (SMT) solvers for the verification problem. -
Automated Verification of Neural Networks: Advances, Challenges and Perspectives
This paper provides an overview of main verification methods, and introduces previous work on combining automated verification with machine learning. They also give some insights on future tendency of the combination between these two domains.
-
Towards Deep Learning Models Resistant to Large Perturbations
This paper prove that the weight initialization of a already robust model on small perturbation can be helpful for training on large perturbations. -
Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin (ICLR 2020)
This paper connect the generalization gap w.r.t all-layer margin, and propose a variant of adversarial training, where the perturbations can be imposed on each layer in network. -
Adversarial Examples Are Not Bugs, They Are Features (NeurIPS 2019)
They claim that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive but locally quite sensitive. -
First-order Adversarial Vulnerability of Neural Networks and Input Dimension (ICML 2019)
This paper demonsrate the relations among adversarial vulnerability and gradient norm and input dimension with comprehensive empirical experiments. -
Adversarial Examples from Computational Constraints (ICML 2019)
The authors argue that the exsitence of adversarial examples could stem from computational constrations. -
Adversarial Examples Are a Natural Consequence of Test Error in Noise (ICML 2019)
This paper connects the relation between the general corruption robustness and the adversarial robustness, and recommand the adversarial defenses methods to be also tested on general-purpose noises. -
PAC-learning in the presence of evasion adversaries (NeurIPS 2018)
The authors analyze the adversarial attacks from the PAC-learning framework. -
Adversarial Vulnerability for Any Classifier (NeurIPS 2018)
Uniform upper bound of robustness for any classifier on the data sampled from smooth genertive models. -
Adversarially Robust Generalization Requires More Data (NeurIPS 2018)
This paper show that robust generalization requires much more sample complexity compared to standard generlization on two simple data distributional models. -
Robustness of Classifiers:from Adversarial to Random Noise (NeurIPS 2016)
-
Aliasing and adversarial robust generalization of CNNs (ECML 2022) This paper empirically demonstrates that adversarial robust models learn to downsample more accurate and thus suffer significantly less from downsampling artifacts, aka. aliasing, than simple non-robust baseline models.
-
Adversarial Robustness Through the Lens of Convolutional Filters (CVPR-W 2022)
This paper compares the learned convolution filters of a large amount of pretrained robust models against identical networks trained without adversarial defenses. The authors show that robust models form more orthogonal, diverse, and less sparse convolution filters, but differences diminish with increasing dataset complexity. -
CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters (CVPR 2022)
This paper performs an empirical analysis of learned 3x3 convolution filters in various CNNs and shows that robust models learn less sparse and more diverse convolution filters. -
PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures (CVPR 2022)
This paper uses dreamlike pictures as data augmentation to generally improve robustness (remove texture-based confounders). -
How Benign is Benign Overfitting (ICLR 2021)
This paper shows that adversarial vulnerability may come from bad data and (poorly) trained models, namely, learned representations. -
Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
This paper explores the limits of adversarial training on CIFAR-10 by applying large model architecture, weight moving average, smooth activation and more training data to achieve SOTA robustness under norm-bounded constraints. -
Bag of Tricks for Adversarial Training (ICLR 2021)
This paper provides an empirical study on the usually overlooked hyperparameters used in adversarial training, and show that inappropriate settings can largely affect the performance of adversarially trained models. -
Neural Anisotropy Directions (NeurIPS 2020)
This paper shows that there exist directional inductive biases of model architectures, which can explain the model reaction against certain adversarial perturbation. -
Hold me tight! Influence of discriminative features on deep network boundaries (NeurIPS 2020)
This paper empirically shows that decision boundaries are constructed along discriminative features, and explain the mechanism of adversarial training. -
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks (ICML 2020)
An comprehensive empirical evaluations on some of the existing defense methods. -
Attacks Which Do Not Kill Training Make Adversarial Learning Stronger (ICML 2020)
This paper also advovate for early-stop during adversarial training. -
Overfitting in adversarially robust deep learning (ICML 2020)
This paper shows the phenomena of overfitting when training robust models with sufficient empirical experiments (codes provided in paper). -
When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks
This paper leverages NAS to understand the influence of network architectures against adversarial attacks. It reveals several useful observations on designing robust network architectures. -
Adversarial Examples Improve Image Recognition
This paper shows that an auxiliary BN for adversarial examples can improve generalization performance. -
Intriguing Properties of Adversarial Training at Scale (ICLR 2020)
This paper investigates the effects of BN and deeper models for adversarial training on ImageNet. -
A Fourier Perspective on Model Robustness in Computer Vision (NeurIPS 2019)
This paper analyzes different types of noises (including adversarial ones) from the Fourier perspective, and observes some relationship between the robustness and the Fourier frequency. -
Interpreting Adversarially Trained Convolutional Neural Networks (ICML 2019)
This paper show that adversarial trained models can alleviate the texture bias and learn a more shape-biased representation. -
On Evaluating Adversarial Robustness
Some analyses on how to correctly evaluate the robustness of adversarial defenses. -
Is Robustness the Cost of Accuracy? -- A Comprehensive Study on the Robustness of 18 Deep Image Classification Models
This paper empirically studies the effects of model architectures (trained on ImageNet) on robustness and accuracy. -
Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong
This paper tests some ensemble of existing detection-based defenses, and claim that these ensemble defenses could still be evaded by white-box attacks.
-
Robust Models are less Over-Confident (NeurIPS 2022)
This paper analyzes the (over)confidence of robust CNNs and concludes that robust models that are significantly less overconfident with their decisions, even on clean data. Further, the authors provide a model zoo of various CNNs trained with and without adversarial defenses. -
Improved Autoregressive Modeling with Distribution Smoothing (ICLR 2021)
This paper apply similar idea of randomized smoothing into autoregressive generative modeling, which first modeling a smoothed data distribution and then denoise the sampled data. -
Defending Against Image Corruptions Through Adversarial Augmentations
This paper proposes AdversarialAugment method to adversarially craft corrupted augmented images during training. -
On the effectiveness of adversarial training against common corruptions
This paper studies how to use adversarial training (both Lp and a relaxation of perceptual adversarial training) to improve the performance on common image corruptions (CIFAR-10-C / ImageNet-100-C). -
Unadversarial Examples: Designing Objects for Robust Vision (NeurIPS 2021)
This paper turns the weakness of adversarial examples into strength, and proposes to use unadversarial examples to enhance model performance and robustness. -
Self-supervised Learning with Adversarial Training (1, 2, 3) (NeurIPS 2020)
These three papers work on embedding adversarial training mechanism into contrastive-based self-supervised learning. They show that AT mechanism can promote the learned representations. -
Do Adversarially Robust ImageNet Models Transfer Better? (NeurIPS 2020)
This paper show that an adversarially robust model can work better for transfer learning, which encourage the learning process to focus on semantic features. -
Adversarial Examples Improve Image Recognition (CVPR 2020)
This paper treat adversarial training as a regularization strategy for traditional classification task, and achieve SOTA clean performance on ImageNet without extra data.
-
Unsolved Problems in ML Safety
A comprehensive roadmap for future researches in Trustworthy ML. -
Towards Deep Learning Models Resistant to Adversarial Attacks (ICLR 2018)
This paper proposed projected gradient descent (PGD) attack, and the PGD-based adversarial training. -
Adversarial examples are not easily detected: Bypassing ten detection methods (AISec 17)
This paper first desgined different adaptive attacks for detection-based methods. -
Explaining and Harnessing Adversarial Examples (ICLR 2015)
This paper proposed fast gradient sign method (FGSM), and the framework of adversarial training. -
Intriguing properties of neural networks (ICLR 2014)
This paper first introduced the concept of adversarial examples in deep learning, and provided a L-BFGS based attack method.
-
RobustBench: a standardized adversarial robustness benchmark
A standardized robustness benchmark with 50+ models together with the Model Zoo. -
Natural adversarial examples
ImageNet-A dataset. -
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations (ICLR 2019)
ImageNet-C dataset. -
Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness (ICLR 2018)
This paper empirically demonstrate that shape-based features lead to more robust models. They also provide the Styled-ImageNet dataset.