Foolbox is a Python toolbox to create adversarial examples that fool neural networks. It requires Python, NumPy and SciPy.
# Foolbox 1.8
pip install foolbox
# Foolbox 2.0 beta
pip install foolbox --pre
Foolbox requires Python 3.5 or newer (since Foolbox 2.0).
Documentation is available on readthedocs: http://foolbox.readthedocs.io/
For the 2.0 beta, please go to https://foolbox.readthedocs.io/en/latest/
Our paper describing Foolbox is on arXiv: https://arxiv.org/abs/1707.04131
import foolbox
import keras
import numpy as np
from keras.applications.resnet50 import ResNet50
# instantiate model
keras.backend.set_learning_phase(0)
kmodel = ResNet50(weights='imagenet')
preprocessing = (np.array([104, 116, 123]), 1)
fmodel = foolbox.models.KerasModel(kmodel, bounds=(0, 255), preprocessing=preprocessing)
# get source image and label
image, label = foolbox.utils.imagenet_example()
# apply attack on source image
# ::-1 reverses the color channels, because Keras ResNet50 expects BGR instead of RGB
attack = foolbox.attacks.FGSM(fmodel)
adversarial = attack(image[:, :, ::-1], label)
# if the attack fails, adversarial will be None and a warning will be printed
For more examples, have a look at the documentation.
Finally, the result can be plotted like this:
# if you use Jupyter notebooks
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure()
plt.subplot(1, 3, 1)
plt.title('Original')
plt.imshow(image / 255) # division by 255 to convert [0, 255] to [0, 1]
plt.axis('off')
plt.subplot(1, 3, 2)
plt.title('Adversarial')
plt.imshow(adversarial[:, :, ::-1] / 255) # ::-1 to convert BGR to RGB
plt.axis('off')
plt.subplot(1, 3, 3)
plt.title('Difference')
difference = adversarial[:, :, ::-1] - image
plt.imshow(difference / abs(difference).max() * 0.2 + 0.5)
plt.axis('off')
plt.show()
Interfaces for a range of other deeplearning packages such as TensorFlow, PyTorch, Theano, Lasagne and MXNet are available, e.g.
model = foolbox.models.TensorFlowModel(images, logits, bounds=(0, 255))
model = foolbox.models.PyTorchModel(torchmodel, bounds=(0, 255), num_classes=1000)
# etc.
Different adversarial criteria such as Top-k, specific target classes or target probability values for the original class or the target class can be passed to the attack, e.g.
criterion = foolbox.criteria.TargetClass(22)
attack = foolbox.attacks.LBFGSAttack(fmodel, criterion)
We welcome feature requests and bug reports. Just create a new issue on GitHub.
Depending on the nature of your question feel free to post it as an issue on GitHub, or post it as a question on Stack Overflow using the foolbox tag. We will try to monitor that tag but if you don't get an answer don't hesitate to contact us.
Before you post a question, please check our FAQ and our Documentation on ReadTheDocs.
Foolbox is a work in progress and any input is welcome.
In particular, we encourage users of deep learning frameworks for which we do not yet have builtin support, e.g. Caffe, Caffe2 or CNTK, to contribute the necessary wrappers. Don't hestiate to contact us if we can be of any help.
Moreoever, attack developers are encouraged to share their reference implementation using Foolbox so that it will be available to everyone.
If you find Foolbox useful for your scientific work, please consider citing it in resulting publications:
@article{rauber2017foolbox, title={Foolbox: A Python toolbox to benchmark the robustness of machine learning models}, author={Rauber, Jonas and Brendel, Wieland and Bethge, Matthias}, journal={arXiv preprint arXiv:1707.04131}, year={2017}, url={http://arxiv.org/abs/1707.04131}, archivePrefix={arXiv}, eprint={1707.04131}, }
You can find the paper on arXiv: https://arxiv.org/abs/1707.04131