This is a pytorch implemention of the ECCV2022 paper
Are vision transformers robust to patch perturbations?

This work discovered that ViTs are more robust to naturally corrupted patches than CNNs, whereas they are more vulnerable to adversarial patches. The reason behind this is as follows:

The attention mechanism can help improve the robustness of Vision Transformer by effectively ignoring natural corrupted patches.

python main.py --pert_type patch_corrupt

The model attention of an input with natural patch corruption is visualized as follows:

When ViTs are attacked by an adversary, the attention mechanism can be easily fooled to focus more on the adversarially perturbed patches and cause a mistake.

python main.py --pert_type patch_attack

The model attention of an input with natural patch corruption is visualized as follows:

If this repo is helpful for you, please cite our work.

@inproceedings{gu2022vision,
  title={Are vision transformers robust to patch perturbations?},
  author={Gu, Jindong and Tresp, Volker and Qin, Yao},
  booktitle={European Conference on Computer Vision},
  pages={404--421},
  year={2022},
  organization={Springer}
}

JindongGu/ViT_Patch_Robustness

This is a pytorch implemention of the ECCV2022 paper Are vision transformers robust to patch perturbations?

This is a pytorch implemention of the ECCV2022 paper
Are vision transformers robust to patch perturbations?