[📜paper]
Official code repository for the paper:
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
[Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, and Xiaogang Wang]
- Whole-body pose estimation training/testing codes release.
- Whole-body pose estimation model zoo release.
- TCFormer-large on COCO-WholeBody dataset.
- Flops calculation function.
- Integrate TCFormer to MMPose.
You can find the pretrained checkpoints here.
Classification configs & weights see >>>here<<<.
- TCFormer on ImageNet-1K
Method | Size | Acc@1 | #Params (M) | Config | Checkpoint | log |
---|---|---|---|---|---|---|
TCFormer-light | 224 | 79.4 | 14.2M | config | 57M [Google] | [Google] |
TCFormer | 224 | 82.3 | 25.6M | config | 103M [Google] | [Google] |
TCFormer-large | 224 | 83.6 | 62.8M | config | 103M [Google] | [Google] |
WholeBody Estimation configs & weights see >>>here<<<.
- Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TCFormer | 256x192 | 0.697 | 0.774 | 0.705 | 0.821 | 0.656 | 0.753 | 0.539 | 0.652 | 0.576 | 0.681 | ckpt | log |
TCFormer_large | 384x288 | 0.718 | 0.794 | 0.744 | 0.850 | 0.790 | 0.856 | 0.614 | 0.715 | 0.642 | 0.733 | ckpt | log |
If you use this code for a paper, please cite:
@inproceedings{zeng2022not,
title={Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer},
author={Zeng, Wang and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={11101--11111},
year={2022}
}
Thanks to:
This project is released under the Apache 2.0 license.