/TCFormer

The codes for TCFormer in paper: Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

Primary LanguagePythonApache License 2.0Apache-2.0

TCFormer (CVPR'2022 Oral, TPAMI'2024)

[CVPR'2022 paper] [TPAMI'2024 paper]

Introduction

Official code repository for the papers:
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
[Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, and Xiaogang Wang]

and

TCFormer: Visual Recognition via Token Clustering Transformer
[Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, and Xiaogang Wang]

teaser

TODO

  • Whole-body pose estimation training/testing codes release.
  • Whole-body pose estimation model zoo release.
  • TCFormer-large on COCO-WholeBody dataset.
  • Flops calculation function.
  • Integrate TCFormer to MMPose.

Model Zoo

You can find the pretrained checkpoints here.

Image Classification

Classification configs & weights see >>>here<<<.

  • TCFormer on ImageNet-1K
Method Size Acc@1 #Params (M) Config Checkpoint log
TCFormer-light 224 79.4 14.2M config 57M [Google] [Google]
TCFormer 224 82.3 25.6M config 103M [Google] [Google]
TCFormer-large 224 83.6 62.8M config 103M [Google] [Google]

WholeBody Estimation

WholeBody Estimation configs & weights see >>>here<<<.

  • Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch Input Size Body AP Body AR Foot AP Foot AR Face AP Face AR Hand AP Hand AR Whole AP Whole AR ckpt log
TCFormer 256x192 0.697 0.774 0.705 0.821 0.656 0.753 0.539 0.652 0.576 0.681 ckpt log
TCFormer_large 384x288 0.718 0.794 0.744 0.850 0.790 0.856 0.614 0.715 0.642 0.733 ckpt log

Citation

If you find this project useful in your research, please cite:

@inproceedings{zeng2022not,
  title={Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer},
  author={Zeng, Wang and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11101--11111},
  year={2022}
}

@article{zeng2024tcformer,
  title={TCFormer: Visual Recognition via Token Clustering Transformer},
  author={Zeng, Wang and Jin, Sheng and Xu, Lumin and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping and Wang, Xiaogang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  publisher={IEEE}
}

Acknowledgement

Thanks to:

License

This project is released under the Apache 2.0 license.