PART: A Python repository from CVTEAM

Part-Guided Relational Transformers for Fine-Grained Visual Recognition

This code provides an implementation of the paper "Part-Guided Relational Transformers for Fine-Grained Visual Recognition". In this paper, we propose to solve this issue in one unified framework from two aspects, i.e., constructing feature-level interrelationships, and capturing part-level discriminative features. This framework, namely PArt-guided Relational Transformers (PART), is proposed to learn the discriminative part features with an automatic part discovery module, and to explore the intrinsic correlations with a feature transformation module by adapting the Transformer models.

Running Environment:

PyTorch>=1.3, tqdm, torchvsion, Pillow, cv2.

The code is constructed using multi-GPUs (2 GPUs are recommended), tested under 2 NVIDIA-3090 or 2 NVIDIA-2080TIs.

If on other GPU settings, the hyper parameters including batchsize should be modified to achieve similar results.

Easy Start for PART

For Training:

Download the benchmark dataset and unzip them in your customized path.

CUB-200-2011 http://www.vision.caltech.edu/visipedia/CUB-200-2011.html

FGVC-Aircraft https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/

Stanford-Cars http://ai.stanford.edu/~jkrause/cars/car_dataset.html
Build the Train/validation partition by yourself or download the files from here.
Modify the configuration files in /config/config.py and /config/default.py
Dataset config

4-1 Set the dataset path in /config/config.py if using CUB dataset

4-2 Set the dataset path in /datasets/UnifiedLoader.py if using other datasets

4-3 Add functions and make your own datasets in /datasets/UnifiedLoader.py
Modify Line 70~76, uncomment used dataset and comment out other datasets
Run train.py for training.

For Testing:

repeat or confirm the operations in training steps 1~5
Modify the class_num in /config/config.py
Put the Pretrained weights in your path
Modify the path to Pretrained weights in /config/config.py
Run test.py

TIPs

We believe the hyperparams are robust for all training datasets. Even learning rate and epochs do not need modification. Simply tuning them may lead to better results but not the main focus of our work.

Moreover, the pretrained models contains many unused branches, blocks, and decoders due to our bad code construction, you do not need to save these part branches for testing.

If you only have GPUs with less than 8GB memories, you should modify the feature dimensions, part numbers and layers of transformers, which lead to slightly lower performance.

Playing with Pretraining

Datasets	CUB-200-2011	Stanford-Cars	FGVC-Aircraft
Results in paper	90.1%	95.3%	94.6%
Results of this repo	90.2%	95.3%	94.7%
Links	ResNet-101	ResNet-101	ResNet-101

Other related models

ResNet-50 for CUB-200-2011: Model

Acknowledgement:

Our code is based on the implementation of PyTorch official libs, DETR, and tiny-baseline.

Citations:

Please remember to cite us if u find this useful : )

@article{zhaoLCT21,
  author    = {Yifan Zhao and
               Jia Li and
               Xiaowu Chen and
               Yonghong Tian},
  title     = {Part-Guided Relational Transformers for Fine-Grained Visual Recognition},
  journal   = {{IEEE} Trans. Image Process.},
  volume    = {30},
  pages     = {9470--9481},
  year      = {2021},
  url       = {https://doi.org/10.1109/TIP.2021.3126490},
  doi       = {10.1109/TIP.2021.3126490},
  timestamp = {Tue, 30 Nov 2021 17:31:13 +0100},
}

License

Please check our License files.