Diffusion-based Pose Refinement and Multi-Hypothesis Generation for 3D Human Pose Estimation

Diffusion-based Pose Refinement and Multi-Hypothesis Generation for 3D Human Pose Estimation,
Hongbo Kang, Yong Wang, Mengyuan Liu, Doudou Wu, Peng Liu, Wenming Yang
arXiv, 2024

This version provides refinement for single-frame models, and future versions will update refinement for multi-frame models.

Results on Human3.6M

Refinement

Method	MPJPE(CPN)
HTNet	48.9 mm
DRPose(w\ HTNet)*	48.3 mm (-0.6)
DC-GCT	48.4 mm
DRPose(w\ DC-GCT)*	47.9 mm (-0.5)

Single-hypothesis

Method	MPJPE(CPN)	P-MPJPE(CPN)	MPJPE(GT)
HTNet	48.9 mm	39.0 mm	34.0 mm
DC-GCT	48.4 mm	38.2 mm	32.4 mm
GFPose*	51.9 mm	-	-
DRPose(w\ DC-GCT)*	47.9 mm	38.1 mm	30.5 mm

Multi-hypothesis

Method	Hypotheses	MPJPE	P-MPJPE
GFPose	10	45.1 mm	-
DRPose(w\ DC-GCT)	10	41.8 mm	33.7 mm
GFPose	200	35.6 mm	30.5 mm
DRPose(w\ DC-GCT)	200	35.5 mm	28.6 mm

Dependencies

Python 3.7+
PyTorch >= 1.10.0

pip install -r requirement.txt

Dataset setup

Please download the dataset here and refer to VideoPose3D to set up the Human3.6M dataset ('./dataset' directory).

${POSE_ROOT}/
|-- dataset
|   |-- data_3d_h36m.npz
|   |-- data_2d_h36m_gt.npz
|   |-- data_2d_h36m_cpn_ft_h36m_dbb.npz

Download pretrained model

The pretrained model is here, please download it and put it in the './checkpoint' directory.

Test the model

To test on Human3.6M on single frame, run:

python main.py --test --previous_dir 'checkpoint/pretrained/cpn_dcgct_4794' --init_model 'dcgct' -k cpn_ft_h36m_dbb --samplimg_timestep 2 --num_proposals 2

You can balance efficiency and accuracy by adjusting --num_proposals (number of hypotheses) and --sampling_timesteps (number of iterations).

The results are saved in the './output' directory. In the results, p_avg and p_best are evaluation metrics related to pose-level, while j_avg and j_best are evaluation metrics related to joint-level. For more details, please refer to D3DP.

Train the model

To train on Human3.6M with single frame, run:

python main.py --init_model 'dcgct' -k cpn_ft_h36m_dbb --timestep 1000

You can set your own initial model using --init_model and modify the initial model loading code in main.py. --timestep is the maximum diffusion time step.

Visualization

coming soon

Citation

If you find our work useful in your research, please consider citing:

@article{kang2024diffusion,
title={Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton},
author={Kang, Hongbo and Wang, Yong and Liu, Mengyuan and Wu, Doudou and Liu, Peng and Yuan, Xinlin and Yang, Wenming},
journal={arXiv preprint arXiv:2401.04921},
year={2024}
}

Acknowledgement

Our code is extended from the following repositories. We thank the authors for releasing the codes.

KHB1698/DRPose