/sharpose

PyTorch implementation of SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

Primary LanguagePython

[AAAI'24] SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

[ Paper ] [ Code ]

High-resolution representation is essential for achieving good performance in human pose estimation models. To obtain such features, existing works utilize high-resolution input images or fine-grained image tokens. However, this dense high-resolution representation brings a significant computational burden. In this paper, we address the following question: ``Only sparse human keypoint locations are detected for human pose estimation, is it really necessary to describe the whole image in a dense, high-resolution manner?" Based on dynamic transformer models, we propose a framework that only uses Sparse High-resolution Representations for human Pose estimation (SHaRPose). In detail, SHaRPose consists of two stages. At the coarse stage, the relations between image regions and keypoints are dynamically mined while a coarse estimation is generated. Then, a quality predictor is applied to decide whether the coarse estimation results should be refined. At the fine stage, SHaRPose builds sparse high-resolution representations only on the regions related to the keypoints and provides refined high-precision human pose estimations. Extensive experiments demonstrate the outstanding performance of the proposed method.

Preparation

0. Requirements

  • Linux
  • CUDA (devel/runtime) ≥11.6
  • conda

1. Clone

git clone https://github.com/AnxQ/sharpose && cd sharpose

2. Environment

conda create -n mmlab_0.x python=3.8
conda activate mmlab_0.x

### For GPUs before RTX40XX
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12.0/index.html
git clone https://github.com/open-mmlab/mmpose.git && cd mmpose && git switch 0.x
pip install -r requirements.txt
pip install -v . 
cd ..

### For RTX40XX (mod & build mmcv-full from source)
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

git clone https://github.com/open-mmlab/mmcv.git && cd mmcv && git checkout tags/v1.7.1
pip install -r requirements.txt
sed -i "160s/self._use_replicated_tensor_module/getattr(self, '_use_replicated_tensor_module', None)/g" mmcv/parallel/distributed.py
sed -i 's/-std=c++14/-std=c++17/' setup.py
MMCV_WITH_OPS=1 pip install -v .
cd ..

git clone https://github.com/open-mmlab/mmpose.git && cd mmpose && git switch 0.x
pip install -r requirements.txt
pip install -v . 
cd ..

### Common 
pip install -r requirements.txt

3. Dataset

  • Download and extract COCO following this, with the root directory replaced by sharpose.
  • Download and extract MPII following this, with the root directory replaced by sharpose.

4. Checkpoints

Download checkpoints from Onedrive, this link includes following files:

  • MAE pretrained checkpoints provided by MAE and ViTPose.
  • Well-trained SHaRPose checkpoints.
# the directory should be like this
sharpose
├── mmpose
├── mmcv
├── mmcv_custom
├── configs
├── models
├── data
│   ├── mpii
│   └── coco
└── checkpoints
    ├── pretrained
    │   ├── mae_pretrain_vit_base.pth
    │   └── mae_pretrain_vit_small.pth
    └── pose
        ├── sharpose_base_coco_256x192_4xb64_210e.pth
        └── ...

Evaluation

Example for evalutating base model with resolution of 384x288 on 2 GPUs:

./dist_test.sh configs/sharpose_base_coco_384x288_4xb64_210e.py checkpoints/pose/sharpose_coco_384x288_b_alpha0.3.pth 2

Training

Example for training base model with resolution of 256x192 on 4 GPUs:

./dist_train.sh configs/sharpose_base_coco_256x192_4xb64_210e.py 4 --cfg-options model.pretrained=checkpoints/pretrained/mae_pretrain_vit_base.pth

Note that the pretrained checkpoints is compatible for any settings(e.g. resolutions/alpha) at the same scale(e.g. base/small).

Visulization

Citation

This will be changed after AAAI'24 proceedings is released.

@inproceedings{sharpose2024,
    title={SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation},
    booktitle={AAAI},
    author={Xiaoqi An, Lin Zhao, Chen Gong, Nannan Wang, Di Wang, Jian Yang},
    year={2024}
}

Acknowledgement

  • MMPose
  • TokenPose
  • CF-ViT
  • ViTPose
  • MAE