/ForschungsArbeit

This project evaluates the robustness of image classification models against adversarial attacks using two key metrics: Adversarial Distance and CLEVER. The study employs variants of the WideResNet model, including a standard and a corruption-trained robust model, trained on the CIFAR-10 dataset. Key insights reveal that the CLEVER Score serves as

Primary LanguageJupyter Notebook

Structured Comparison of Metrics to Evaluate the Robustness of Image Classification Models

Abstract

This research evaluates the robustness of image classification models against adversarial attacks using two metrics: Adversarial Distance and CLEVER. The study uses variants of the WideResNet model trained on the CIFAR-10 dataset, including a standard and a corruption-trained robust model. Key findings include:

  • Adversarial Distance Metric: Provides an upper-bound approximation of perturbations.
  • CLEVER Metric: Offers a lower-bound estimation of perturbations. The corruption-trained robust model exhibits greater resilience against adversarial examples compared to the standard model.

Introduction

Advancements in deep learning have highlighted vulnerabilities in image classifiers to adversarial attacks. This project focuses on a systematic comparison of robustness metrics using the WideResNet architecture.

Methodology

  • Dataset: CIFAR-10
  • Models: Standard WideResNet and corruption-trained WideResNet
  • Metrics: Adversarial Distance (L∞ norm) and CLEVER score
  • Tools: PyTorch and CleverHans library for implementing attacks and measuring robustness.

Results

Standard Model Robust Model

Using the CIFAR-10 test dataset as a benchmark, the metrics consistently proved effective. Notably, the CLEVER Score was smaller than the Adversarial Distance metric for approximately 86% of the images, serving as the lower boundary for perturbation magnitude in projected gradient attacks, while the Adversarial Distance metric established itself as the upper bound. Interestingly, the behavior of corruption-trained robust models deviates slightly from the standard model when faced with adversarial examples, indicating a higher degree of resilience in the robust models. The robust model showed a cleaner scatter plot of distances and scores, with adversarial distances aligning closely with CLEVER scores compared to the standard model. These findings advance our understanding of model behavior in adversarial scenarios, emphasizing the importance of the proposed metrics in evaluating and assessing model robustness. The study highlights that while the CLEVER score does not universally act as a lower bound, it does so for around 86% of images, making it relevant for assessing perturbation magnitude in adversarial attacks for both model variants. Simultaneously, the Adversarial Distance metric functions as the upper bound, revealing unexpected alignment in behavior between corruption-trained robust models and the standard model in response to adversarial examples. This research enhances our comprehension of model dynamics in adversarial contexts and underscores the significance of the proposed metrics in quantifying and evaluating model robustness.

Conclusion

The study underscores the importance of evaluating multiple robustness metrics to ensure the reliability of neural networks in adversarial scenarios. The corruption-trained model demonstrates a promising approach to enhancing model robustness.