Authors: Yang Liu, Changzhen Qiu, Zhiyong Zhang*
School of Electronics and Communication Engineering, Sun Yat-sen University, Shenzhen, Guangdong, China
This is the regularly updated project page of Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey, a review that primarily concentrates on deep learning approaches to 3D human pose estimation and human mesh recovery. This survey comprehensively includes the most recent state-of-the-art publications (2019-now) from mainstream computer vision conferences and journals.
Please create issues if you have any suggestions!
@article{liu2024deep,
title={Deep learning for 3D human pose estimation and mesh recovery: A survey},
author={Liu, Yang and Qiu, Changzhen and Zhang, Zhiyong},
journal={Neurocomputing},
pages={128049},
year={2024},
issn={0925-2312},
doi={https://doi.org/10.1016/j.neucom.2024.128049},
publisher={Elsevier}
}
- Single Person
- In Images
- Solving Depth Ambiguity
- Solving Body Structure Understanding
- Solving Occlusion Problems
- Solving Data Lacking
- In Videos
- Solving Single-frame Limitation
- Solving Real-time Problems
- Solving Body Structure Understanding
- Solving Occlusion Problems
- Solving Data Lacking
- In Images
- Multi-person
- Top-down
- Bottom-up -Solving Real-time Problems - Fabbri et al. [paper]
- Others
- Template-based
- Naked
- Multimodal Methods
- Utilizing Attention Mechanism
- Exploiting Temporal Information
- Multi-view Methods
- Boosting Efficiency
- Developing Various Representations
- Utilizing Structural Information
- Choosing Appropriate Learning Strategies
- Self-improving: SPIN [paper], ReFit [paper], You et al. [paper]
- Novel losses: Zanfir et al. [paper], Jiang et al. [paper]
- Unsupervised learning: Madadi et al. [paper], Yu et al. [paper]
- Bilevel online adaptation: Guan et al. [paper]
- Single-shot: Pose2UV [paper]
- Contrastive learning: JOTR [paper]
- Domain adaptation: Nam et al. [paper]
- Detailed
- With Clothes
- With Hands
- Whole Body
- Naked
- Template-free
Dataset | Type | Data | Total frames | Feature | Download link |
---|---|---|---|---|---|
Human3.6M | 3D/Mesh | Video | 3.6M | multi-view | Website |
3DPW | 3D/Mesh | Video | 51K | multi-person | Website |
MPI-INF-3DHP | 2D/3D | Video | 2K | in-wild | Website |
HumanEva | 3D | Video | 40K | multi-view | Website |
CMU-Panoptic | 3D | Video | 1.5M | multi-view/multi-person | Website |
MuCo-3DHP | 3D | Image | 8K | multi-person/occluded scene | Website |
SURREAL | 2D/3D/Mesh | Video | 6.0M | synthetic model | Website |
3DOH50K | 2D/3D/Mesh | Image | 51K | object-occluded | Website |
3DCP | Mesh | Mesh | 190 | contact | Website |
AMASS | Mesh | Motion | 11K | soft-tissue dynamics | Website |
DensePose | Mesh | Image | 50K | multi-person | Website |
UP-3D | 3D/Mesh | Image | 8K | sport scene | Website |
THuman2.0 | Mesh | Image | 7K | textured surface | Website |
Method | Year | Publication | Highlight | MPJPE↓ | PMPJPE↓ | Code |
---|---|---|---|---|---|---|
Graformer | 2022 | CVPR'22 | graph-based transformer | 35.2 | - | Code |
GLA-GCN | 2023 | ICCV'23 | adaptive GCN | 34.4 | 37.8 | Code |
PoseDA | 2023 | arXiv'23 | domain adaptation | 49.4 | 34.2 | Code |
GFPose | 2023 | CVPR'23 | gradient fields | 35.6 | 30.5 | Code |
TP-LSTMs | 2022 | TPAMI'22 | pose similarity metric | 40.5 | 31.8 | - |
FTCM | 2023 | TCSVT'23 | frequency-temporal collaborative | 28.1 | - | Code |
VideoPose3D | 2019 | CVPR'19 | semi-supervised | 46.8 | 36.5 | Code |
PoseFormer | 2021 | ICCV'21 | spatio-temporal transformer | 44.3 | 34.6 | Code |
STCFormer | 2023 | CVPR'23 | spatio-temporal transformer | 40.5 | 31.8 | Code |
3Dpose_ssl | 2020 | TPAMI'20 | self-supervised | 63.6 | 63.7 | Code |
MTF-Transformer | 2022 | TPAMI'22 | multi-view temporal fusion | 26.2 | - | Code |
AdaptPose | 2022 | CVPR'22 | cross datasets | 42.5 | 34.0 | Code |
3D-HPE-PAA | 2022 | TIP'22 | part aware attention | 43.1 | 33.7 | Code |
DeciWatch | 2022 | ECCV'22 | efficient framework | 52.8 | - | Code |
Diffpose | 2023 | CVPR'23 | pose refine | 36.9 | 28.7 | Code |
Elepose | 2022 | CVPR'22 | unsupervised | - | 36.7 | Code |
Uplift and Upsample | 2023 | CVPR'23 | efficient transformers | 48.1 | 37.6 | Code |
RS-Net | 2023 | TIP'23 | regular splitting graph network | 48.6 | 38.9 | Code |
HSTFormer | 2023 | arXiv'23 | spatial-temporal transformers | 42.7 | 33.7 | Code |
PoseFormerV2 | 2023 | CVPR'23 | frequency domain | 45.2 | 35.6 | Code |
DiffPose | 2023 | ICCV'23 | diffusion models | 42.9 | 30.8 | Code |
Method | Year | Publication | Highlight | MPJPE↓ | PCK↑ | AUC↑ | Code |
---|---|---|---|---|---|---|---|
HSTFormer | 2023 | arXiv'23 | spatial-temporal transformers | 28.3 | 98.0 | 78.6 | Code |
PoseFormerV2 | 2023 | CVPR'23 | frequency domain | 27.8 | 97.9 | 78.8 | Code |
Uplift and Upsample | 2023 | CVPR'23 | efficient transformers | 46.9 | 95.4 | 67.6 | Code |
RS-Net | 2023 | TIP'23 | regular splitting graph network | - | 85.6 | 53.2 | Code |
Diffpose | 2023 | CVPR'23 | pose refine | 29.1 | 98.0 | 75.9 | Code |
FTCM | 2023 | TCSVT'23 | frequency-temporal collaborative | 31.2 | 97.9 | 79.8 | Code |
STCFormer | 2023 | CVPR'23 | spatio-temporal transformer | 23.1 | 98.7 | 83.9 | Code |
PoseDA | 2023 | arXiv'23 | domain adaptation | 61.3 | 92.0 | 62.5 | Code |
TP-LSTMs | 2022 | TPAMI'22 | pose similarity metric | 48.8 | 82.6 | 81.3 | - |
AdaptPose | 2022 | CVPR'22 | cross datasets | 77.2 | 88.4 | 54.2 | Code |
3D-HPE-PAA | 2022 | TIP'22 | part aware attention | 69.4 | 90.3 | 57.8 | Code |
Elepose | 2022 | CVPR'22 | unsupervised | 54.0 | 86.0 | 50.1 | Code |
Method | Publication | Highlight | Human3.6M MPJPE↓ | Human3.6M PA-MPJPE↓ | 3DPW MPJPE↓ | 3DPW PA-MPJPE↓ | 3DPW PVE↓ | Code |
---|---|---|---|---|---|---|---|---|
VirtualMarker | CVPR'23 | novel intermediate representation | 47.3 | 32.0 | 67.5 | 41.3 | 77.9 | Code |
NIKI | CVPR'23 | inverse kinematics | - | - | 71.3 | 40.6 | 86.6 | Code |
TORE | ICCV'23 | efficient transformer | 59.6 | 36.4 | 72.3 | 44.4 | 88.2 | Code |
JOTR | ICCV'23 | contrastive learning | - | - | 76.4 | 48.7 | 92.6 | Code |
HMDiff | ICCV'23 | reverse diffusion processing | 49.3 | 32.4 | 72.7 | 44.5 | 82.4 | Code |
ReFit | ICCV'23 | recurrent fitting network | 48.4 | 32.2 | 65.8 | 41.0 | - | Code |
PyMAF-X | TPAMI'23 | regression-based one-stage whole body | - | - | 74.2 | 45.3 | 87.0 | Code |
PointHMR | CVPR'23 | vertex-relevant feature extraction | 48.3 | 32.9 | 73.9 | 44.9 | 85.5 | - |
PLIKS | CVPR'23 | inverse kinematics | 47.0 | 34.5 | 60.5 | 38.5 | 73.3 | Code |
ProPose | CVPR'23 | learning analytical posterior probability | 45.7 | 29.1 | 68.3 | 40.6 | 79.4 | Code |
POTTER | CVPR'23 | pooling attention transformer | 56.5 | 35.1 | 75.0 | 44.8 | 87.4 | Code |
PoseExaminer | ICCV'23 | automated testing of out-of-distribution | - | - | 74.5 | 46.5 | 88.6 | Code |
MotionBERT | ICCV'23 | pretrained human representations | 43.1 | 27.8 | 68.8 | 40.6 | 79.4 | Code |
3DNBF | ICCV'23 | analysis-by-synthesis approach | - | - | 88.8 | 53.3 | - | Code |
FastMETRO | ECCV'22 | efficient architecture | 52.2 | 33.7 | 73.5 | 44.6 | 84.1 | Code |
CLIFF | ECCV'22 | multi-modality inputs | 47.1 | 32.7 | 69.0 | 43.0 | 81.2 | Code |
PARE | ICCV'21 | part-driven attention | - | - | 74.5 | 46.5 | 88.6 | Code |
Graphormer | ICCV'21 | GCNN-reinforced transformer | 51.2 | 34.5 | 74.7 | 45.6 | 87.7 | Code |
PSVT | CVPR'23 | spatio-temporal encoder | - | - | 73.1 | 43.5 | 84.0 | - |
GLoT | CVPR'23 | short-term and long-term temporal correlations | 67.0 | 46.3 | 80.7 | 50.6 | 96.3 | Code |
MPS-Net | CVPR'23 | temporally adjacent representations | 69.4 | 47.4 | 91.6 | 54.0 | 109.6 | Code |
MAED | ICCV'21 | multi-level attention | 56.4 | 38.7 | 79.1 | 45.7 | 92.6 | Code |
Lee et al. | ICCV'21 | uncertainty-aware | 58.4 | 38.4 | 92.8 | 52.2 | 106.1 | - |
TCMR | CVPR'21 | temporal consistency | 62.3 | 41.1 | 95.0 | 55.8 | 111.3 | - |
VIBE | CVPR'20 | self-attention temporal network | 65.6 | 41.4 | 82.9 | 51.9 | 99.1 | Code |
ImpHMR | CVPR'23 | implicitly imagine person in 3D space | - | - | 74.3 | 45.4 | 87.1 | - |
SGRE | ICCV'23 | sequentially global rotation estimation | - | - | 78.4 | 49.6 | 93.3 | Code |
PMCE | ICCV'23 | pose and mesh co-evolution network | 53.5 | 37.7 | 69.5 | 46.7 | 84.8 | Code |