Robustifying Token Attention for Vision Transformers

Yong Guo, David Stutz, and Bernt Schiele. ICCV 2023.

Paper | Slides | Poster

This repository contains the official Pytorch implementation and the pretrained models of Robustifying Token Attention for Vision Transformers.

Catalog

Pre-trained models for image classification
Pre-trained models for semantic segmentation
Evaluation and Training Code

Dependencies

Our code is built based on pytorch and timm library. Please check the detailed dependencies in requirements.txt.

Dataset Preparation

Image Classfication: ImageNet and related robustness benchmarks

Please download the clean ImageNet dataset. We evaluate the models on varisous robustness benchmarks, including ImageNet-C, ImageNet-A, ImageNet-P, and ImageNet-R.

Semantic Segmentaton: Cityscapes and related robustness benchmarks

Please download the clean Cityscapes dataset. We evaluate the models on varisous robustness benchmarks, including Cityscapes-C and ACDC (test set).

Training and Evaluation (using TAP and ADL)

Image Classification:

Please see how to train/evaluate FAN and RVT models in TAPADL_FAN and TAPADL_RVT, respectively.
Semantic Segmentation:

Please see how to train/evaluate our segmentation model in TAPADL_FAN/segmentation.

Acknowledgement

This repository is built using the timm library, RVT, and FAN repositories.

Citation

If you find this repository helpful, please consider citing:

@inproceedings{guo2023robustifying,
title={Robustifying token attention for vision transformers},
author={Guo, Yong and Stutz, David and Schiele, Bernt},
booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)}},
year={2023}
}

guoyongcs/TAPADL