This code provides an implementation of the paper "Part-Guided Relational Transformers for Fine-Grained Visual Recognition". In this paper, we propose to solve this issue in one unified framework from two aspects, i.e., constructing feature-level interrelationships, and capturing part-level discriminative features. This framework, namely PArt-guided Relational Transformers (PART), is proposed to learn the discriminative part features with an automatic part discovery module, and to explore the intrinsic correlations with a feature transformation module by adapting the Transformer models.
PyTorch>=1.3, tqdm, torchvsion, Pillow, cv2.
The code is constructed using multi-GPUs (2 GPUs are recommended), tested under 2 NVIDIA-3090 or 2 NVIDIA-2080TIs.
If on other GPU settings, the hyper parameters including batchsize should be modified to achieve similar results.
-
Download the benchmark dataset and unzip them in your customized path.
CUB-200-2011 http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
FGVC-Aircraft https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/
Stanford-Cars http://ai.stanford.edu/~jkrause/cars/car_dataset.html
-
Build the Train/validation partition by yourself or download the files from here.
-
Modify the configuration files in /config/config.py and /config/default.py
-
Dataset config
4-1 Set the dataset path in /config/config.py if using CUB dataset
4-2 Set the dataset path in /datasets/UnifiedLoader.py if using other datasets
4-3 Add functions and make your own datasets in /datasets/UnifiedLoader.py
-
Modify Line 70~76, uncomment used dataset and comment out other datasets
-
repeat or confirm the operations in training steps 1~5
-
Modify the class_num in /config/config.py
-
Put the Pretrained weights in your path
-
Modify the path to Pretrained weights in /config/config.py
-
Run test.py
We believe the hyperparams are robust for all training datasets. Even learning rate and epochs do not need modification. Simply tuning them may lead to better results but not the main focus of our work.
Moreover, the pretrained models contains many unused branches, blocks, and decoders due to our bad code construction, you do not need to save these part branches for testing.
If you only have GPUs with less than 8GB memories, you should modify the feature dimensions, part numbers and layers of transformers, which lead to slightly lower performance.
Datasets | CUB-200-2011 | Stanford-Cars | FGVC-Aircraft |
---|---|---|---|
Results in paper | 90.1% | 95.3% | 94.6% |
Results of this repo | 90.2% | 95.3% | 94.7% |
Links | ResNet-101 | ResNet-101 | ResNet-101 |
ResNet-50 for CUB-200-2011: Model
Our code is based on the implementation of PyTorch official libs, DETR, and tiny-baseline.
Please remember to cite us if u find this useful : )
@article{zhaoLCT21,
author = {Yifan Zhao and
Jia Li and
Xiaowu Chen and
Yonghong Tian},
title = {Part-Guided Relational Transformers for Fine-Grained Visual Recognition},
journal = {{IEEE} Trans. Image Process.},
volume = {30},
pages = {9470--9481},
year = {2021},
url = {https://doi.org/10.1109/TIP.2021.3126490},
doi = {10.1109/TIP.2021.3126490},
timestamp = {Tue, 30 Nov 2021 17:31:13 +0100},
}
Please check our License files.