/SOTA-3DHPE-HMR

Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey

Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey

Authors: Yang Liu, Changzhen Qiu, Zhiyong Zhang*

School of Electronics and Communication Engineering, Sun Yat-sen University, Shenzhen, Guangdong, China

Overview

This is the regularly updated project page of Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey, a review that primarily concentrates on deep learning approaches to 3D human pose estimation and human mesh recovery. This survey comprehensively includes the most recent state-of-the-art publications (2019-now) from mainstream computer vision conferences and journals.

Please create issues if you have any suggestions!

Citation

Please kindly cite the papers if our work is useful and helpful for your research.

@article{liu2024deep,
      title={Deep learning for 3D human pose estimation and mesh recovery: A survey}, 
      author={Liu, Yang and Qiu, Changzhen and Zhang, Zhiyong},
      journal={Neurocomputing},
      pages={128049},
      year={2024},
      issn={0925-2312},
      doi={https://doi.org/10.1016/j.neucom.2024.128049},
      publisher={Elsevier}
}

3D human pose estimation

  • Single Person
  • Multi-person
    • Top-down
      • Solving Real-time Problems
      • Solving Representation Limitation
      • Solving Occlusion Problems
      • Solving Data Lacking
    • Bottom-up -Solving Real-time Problems - Fabbri et al. [paper]
      • Solving Supervisory Limitation.
      • Solving Data Lacking
      • Solving Occlusion Problems
    • Others
      • Single Stage
      • Top-down & Bottom-up

Human Mesh Recovery

The overview of the mainstream datasets.

Dataset Type Data Total frames Feature Download link
Human3.6M 3D/Mesh Video 3.6M multi-view Website
3DPW 3D/Mesh Video 51K multi-person Website
MPI-INF-3DHP 2D/3D Video 2K in-wild Website
HumanEva 3D Video 40K multi-view Website
CMU-Panoptic 3D Video 1.5M multi-view/multi-person Website
MuCo-3DHP 3D Image 8K multi-person/occluded scene Website
SURREAL 2D/3D/Mesh Video 6.0M synthetic model Website
3DOH50K 2D/3D/Mesh Image 51K object-occluded Website
3DCP Mesh Mesh 190 contact Website
AMASS Mesh Motion 11K soft-tissue dynamics Website
DensePose Mesh Image 50K multi-person Website
UP-3D 3D/Mesh Image 8K sport scene Website
THuman2.0 Mesh Image 7K textured surface Website

Comparisons of 3D pose estimation methods on Human3.6M.

Method Year Publication Highlight MPJPE↓ PMPJPE↓ Code
Graformer 2022 CVPR'22 graph-based transformer 35.2 - Code
GLA-GCN 2023 ICCV'23 adaptive GCN 34.4 37.8 Code
PoseDA 2023 arXiv'23 domain adaptation 49.4 34.2 Code
GFPose 2023 CVPR'23 gradient fields 35.6 30.5 Code
TP-LSTMs 2022 TPAMI'22 pose similarity metric 40.5 31.8 -
FTCM 2023 TCSVT'23 frequency-temporal collaborative 28.1 - Code
VideoPose3D 2019 CVPR'19 semi-supervised 46.8 36.5 Code
PoseFormer 2021 ICCV'21 spatio-temporal transformer 44.3 34.6 Code
STCFormer 2023 CVPR'23 spatio-temporal transformer 40.5 31.8 Code
3Dpose_ssl 2020 TPAMI'20 self-supervised 63.6 63.7 Code
MTF-Transformer 2022 TPAMI'22 multi-view temporal fusion 26.2 - Code
AdaptPose 2022 CVPR'22 cross datasets 42.5 34.0 Code
3D-HPE-PAA 2022 TIP'22 part aware attention 43.1 33.7 Code
DeciWatch 2022 ECCV'22 efficient framework 52.8 - Code
Diffpose 2023 CVPR'23 pose refine 36.9 28.7 Code
Elepose 2022 CVPR'22 unsupervised - 36.7 Code
Uplift and Upsample 2023 CVPR'23 efficient transformers 48.1 37.6 Code
RS-Net 2023 TIP'23 regular splitting graph network 48.6 38.9 Code
HSTFormer 2023 arXiv'23 spatial-temporal transformers 42.7 33.7 Code
PoseFormerV2 2023 CVPR'23 frequency domain 45.2 35.6 Code
DiffPose 2023 ICCV'23 diffusion models 42.9 30.8 Code

Comparisons of 3D pose estimation methods on MPI-INF-3DHP.

Method Year Publication Highlight MPJPE↓ PCK↑ AUC↑ Code
HSTFormer 2023 arXiv'23 spatial-temporal transformers 28.3 98.0 78.6 Code
PoseFormerV2 2023 CVPR'23 frequency domain 27.8 97.9 78.8 Code
Uplift and Upsample 2023 CVPR'23 efficient transformers 46.9 95.4 67.6 Code
RS-Net 2023 TIP'23 regular splitting graph network - 85.6 53.2 Code
Diffpose 2023 CVPR'23 pose refine 29.1 98.0 75.9 Code
FTCM 2023 TCSVT'23 frequency-temporal collaborative 31.2 97.9 79.8 Code
STCFormer 2023 CVPR'23 spatio-temporal transformer 23.1 98.7 83.9 Code
PoseDA 2023 arXiv'23 domain adaptation 61.3 92.0 62.5 Code
TP-LSTMs 2022 TPAMI'22 pose similarity metric 48.8 82.6 81.3 -
AdaptPose 2022 CVPR'22 cross datasets 77.2 88.4 54.2 Code
3D-HPE-PAA 2022 TIP'22 part aware attention 69.4 90.3 57.8 Code
Elepose 2022 CVPR'22 unsupervised 54.0 86.0 50.1 Code

Comparisons of human mesh recovery methods on Human3.6M and 3DPW.

Method Publication Highlight Human3.6M MPJPE↓ Human3.6M PA-MPJPE↓ 3DPW MPJPE↓ 3DPW PA-MPJPE↓ 3DPW PVE↓ Code
VirtualMarker CVPR'23 novel intermediate representation 47.3 32.0 67.5 41.3 77.9 Code
NIKI CVPR'23 inverse kinematics - - 71.3 40.6 86.6 Code
TORE ICCV'23 efficient transformer 59.6 36.4 72.3 44.4 88.2 Code
JOTR ICCV'23 contrastive learning - - 76.4 48.7 92.6 Code
HMDiff ICCV'23 reverse diffusion processing 49.3 32.4 72.7 44.5 82.4 Code
ReFit ICCV'23 recurrent fitting network 48.4 32.2 65.8 41.0 - Code
PyMAF-X TPAMI'23 regression-based one-stage whole body - - 74.2 45.3 87.0 Code
PointHMR CVPR'23 vertex-relevant feature extraction 48.3 32.9 73.9 44.9 85.5 -
PLIKS CVPR'23 inverse kinematics 47.0 34.5 60.5 38.5 73.3 Code
ProPose CVPR'23 learning analytical posterior probability 45.7 29.1 68.3 40.6 79.4 Code
POTTER CVPR'23 pooling attention transformer 56.5 35.1 75.0 44.8 87.4 Code
PoseExaminer ICCV'23 automated testing of out-of-distribution - - 74.5 46.5 88.6 Code
MotionBERT ICCV'23 pretrained human representations 43.1 27.8 68.8 40.6 79.4 Code
3DNBF ICCV'23 analysis-by-synthesis approach - - 88.8 53.3 - Code
FastMETRO ECCV'22 efficient architecture 52.2 33.7 73.5 44.6 84.1 Code
CLIFF ECCV'22 multi-modality inputs 47.1 32.7 69.0 43.0 81.2 Code
PARE ICCV'21 part-driven attention - - 74.5 46.5 88.6 Code
Graphormer ICCV'21 GCNN-reinforced transformer 51.2 34.5 74.7 45.6 87.7 Code
PSVT CVPR'23 spatio-temporal encoder - - 73.1 43.5 84.0 -
GLoT CVPR'23 short-term and long-term temporal correlations 67.0 46.3 80.7 50.6 96.3 Code
MPS-Net CVPR'23 temporally adjacent representations 69.4 47.4 91.6 54.0 109.6 Code
MAED ICCV'21 multi-level attention 56.4 38.7 79.1 45.7 92.6 Code
Lee et al. ICCV'21 uncertainty-aware 58.4 38.4 92.8 52.2 106.1 -
TCMR CVPR'21 temporal consistency 62.3 41.1 95.0 55.8 111.3 -
VIBE CVPR'20 self-attention temporal network 65.6 41.4 82.9 51.9 99.1 Code
ImpHMR CVPR'23 implicitly imagine person in 3D space - - 74.3 45.4 87.1 -
SGRE ICCV'23 sequentially global rotation estimation - - 78.4 49.6 93.3 Code
PMCE ICCV'23 pose and mesh co-evolution network 53.5 37.7 69.5 46.7 84.8 Code