/fooling-NNs

Primary LanguagePythonMIT LicenseMIT

fooling-NNs

This project implements an adversarial attack on neural networks by optimizing on images to maximize the likelihood of false predictions.

We follow Szegedy et al.’s Intriguing properties of neural networks [1], but optimize using gradient descent instead of L-BFGS to study the incremental properties of attacks. We also perform analysis on the transferability of these attacks and the ease of fooling across classes.

[1] https://arxiv.org/abs/1312.6199