/adversarial-pytorch

Implementation of adversarial attacks and defences.

Primary LanguagePython

Implementation of papers on Adversarial attacks and defences

Dependencies

  • Python3
  • PyTorch
  • OpenCV
  • NumPy
  • SciPy

Fast Gradient Sign Method

Paper: Adversarial examples in the physical world

Usage

  • Run the script
$ python3 fgsm.py --img images/goldfish.jpg --model inception_v3
  • Control keys

    • use trackbar to change epsilon (max norm)
    • esc close
    • s save perturbation and adversarial image
  • Demo

fgsm.gif


Basic Iterative Method (Targeted and Non-targeted)

Paper: Adversarial examples in the physical world

Usage

  • Run the script
$ python3 iterative.py --img images/goldfish.jpg --model resnet18 --y_target 4
  • Control keys

    • use trackbar to change epsilon (max norm of perturbation) and iter (number of iterations)
    • esc close and space to pause
    • s save perturbation and adversarial image
  • Demo
    fgsm.gif


From one of the first papers on Adversarial examples - Explaining and Harnessing Adversarial Examples by Ian Goodfellow,

The direction of perturbation, rather than the specific point in space, matters most. Space is not full of pockets of adversarial examples that finely tile the reals like the rational numbers.

To test this, I've written explore_space.py.

fgsm.gif

This code adds to the input image (img), a randomly generated perturbation (vec1) which is subjected to a max norm constraint eps. To explore a region (a hypersphere) around this adversarial image (img + vec1) , we add to it another perturbation (vec2) which is constrained by L2 norm rad. By pressing keys e and r, new vec1 and vec2 are generated respectively.

  • Random perturbations
    The classifier is robust to these random perturbations even though they have significantly higher max norm.
horse_explore automobile_explore truck_explore
horse automobile : truck :
  • Generated perturbation by FGSM
    A properly directed perturbation with max norm as low as 3, which is almost imperceptible, can fool the classifier.
horse_scaled horse_adversarial perturbation
horse predicted - dog perturbation (eps = 6)

One Pixel Attack

Paper: One pixel attack for fooling deep neural networks

  • Run the script
$ python3 one_pixel.py --img images/airplane.jpg --d 3 --iters 600 --popsize 10

Here, d is number of pixels to change (L0 norm). Finding successful attacks was difficult. I noticed that only images with low confidence are giving successful attacks.

cat airplane
frog [0.8000] bird [0.8073]

Generative Adversarial Trainer

Paper: Generative Adversarial Trainer: Defense to Adversarial Perturbations with GAN

Architecture

gat architecture

  • To train python3 gat.py
  • Trainig has not been done as generator doesn't converge. Also, I'm a bit skeptical about this, they did not show any adverarial images in paper.

Adversarial Spheres

Paper: Adversarial Spheres


AdvGAN

Paper: Generating Adversarial Examples with Adversarial Networks

[WIP]

Fast Feature Fool

Paper: Fast Feature Fool: A data independent approach to universal adversarial perturbations

[WIP]

ToDo: