/torchadver

A PyTorch Toolbox for creating adversarial examples that fool neural networks.

Primary LanguagePython

Introduction

torchadver is a Pytorch tool box for generating adversarial images. The basic adversarial attack are implemented. Such as FSGM, I-FGSM, MI-FGSM, M-DI-FGSM, C&W .etc.

Installation

How to Use

The brief attack process is shown below. More detailed process introduction you can refer to ./examples/toturial.py.

Generate adversarial images by satisfy L2 norm

Non-targeted attack

from torchadver.attacker.iterative_gradient_attack import FGM_L2, I_FGM_L2, MI_FGM_L2, M_DI_FGM_L2


mean = [0.5, 0.5, 0.5]
std = [0.5, 0.5, 0.5]

# images normalized by mean and std
images, labels = ...
model = ...

# use mean and std to determine effective range of pixel of image in channels.
attacker = FGM_L2(model, loss_fn=nn.CrossEntropyLoss(),
				  mean=mean, std=std, 
				  max_norm=4.0, # L2 norm bound
				  random_init=True)

# for non-targeted attack
adv_images = attacker.attack(images, labels) # or adv_images = attacker.attack(images)

Targeted attack

from torchadver.attacker.iterative_gradient_attack import FGM_L2, I_FGM_L2, MI_FGM_L2, M_DI_FGM_L2


mean = [0.5, 0.5, 0.5]
std = [0.5, 0.5, 0.5]

# images normalized by mean and std
images, labels = ...
model = ...
targeted_labels = ...

# use mean and std to determine effective range of pixel of image in channels.
attacker = FGM_L2(model, loss_fn=nn.CrossEntropyLoss(),
				  mean=mean, std=std, 
				  max_norm=4.0, # L2 norm bound
				  random_init=True)

# for non-targeted attack
adv_images = attacker.attack(images, targeted_labels)

Generate adversarial images by satisfy Linf norm

Non-targeted attack

from torchadver.attacker.iterative_gradient_attack import FGM_LInf, I_FGM_LInf, MI_FGM_LInf, M_DI_FGM_LInf


mean = [0.5, 0.5, 0.5]
std = [0.5, 0.5, 0.5]

# images normalized by mean and std
images, labels = ...
model = ...

# use mean and std to determine effective range of pixel of image in channels.
attacker = FGM_L2(model, loss_fn=nn.CrossEntropyLoss(),
				 mean=mean, std=std,
				 max_norm=0.1, # Linf norm bound
				 random_init=True)

# for non-targeted attack
adv_images = attacker.attack(images, labels) # or adv_images = attacker.attack(images)

Targeted attack

from torchadver.attacker.iterative_gradient_attack import FGM_LInf, I_FGM_LInf, MI_FGM_LInf, M_DI_FGM_LInf


mean = [0.5, 0.5, 0.5]
std = [0.5, 0.5, 0.5]

# images normalized by mean and std
images, labels = ...
model = ...
targeted_labels = ...

# use mean and std to determine effective range of pixel of image in channels.
attacker = FGM_L2(model, loss_fn=nn.CrossEntropyLoss(),
				 mean=mean, std=std,
				 max_norm=0.1, # Linf norm bound
				 random_init=True, targeted=True)

# for non-targeted attack
adv_images = attacker.attack(images, targeted_labels)

Citations

More information about adversarial attack about deep learning, refer to awesome-adversarial-deep-learning.