Paper list of monocular 3D human pose and shape estimation

Single-view recovery
2D pose to 3D pose
Video
Multi-view recovery
Multi-person
Detailed geometry
Self-supervise/ weak supervise
Interaction with scene
Total capture
Future prediction
Other input
Others
3D Hands

Single-view recovery

[ECCV2020]，I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image
[CVPR2020]，3D Human Mesh Regression with Dense Correspondence
[ICCV2019]，Georgios Pavlakos et al，TexturePose: Supervising Human Mesh Estimation with Texture Consistency
[ICCV2019], Nikos Kolotouros et al., Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop
[ICCV2019], Saurabh Sharma et al., Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking
[ICCV2019], Kun Zhou et al., HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation
[CVPR2019], Long Zhao et al., Semantic Graph Convolutional Networks for 3D Human Pose Regression
[CVPR2019], Chen Li et al., Generating Multiple Hypotheses for 3D Human Pose Estimation With Mixture Density Network
[CVPR2019], Riza Alp Guler et al.， HoloPose: Holistic 3D Human Reconstruction In-The-Wild
[CVPR2019], Xipeng Chen et al.， Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation
[CVPR2019], Ikhsanul Habibie et al.， In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations

2D pose to 3D pose

[ICCV2019], Hai Ci et al., Optimizing Network Structure for 3D Human Pose Estimation
[CVPR2019], Dario Pavllo et al., 3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training

Video

[CVPR2020] Muhammed Kocabas et al., VIBE: Video Inference for Human Body Pose and Shape Estimation
[CVPR2019] Angjoo Kanazawa et al., Learning 3D Human Dynamics from Video
[ICCV2019], Jason Y. Zhang et al., Predicting 3D Human Dynamics from Video
[ICCV2019], Yu Sun et al., Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation
[CVPR2019], Anurag Arnab et al., Exploiting temporal context for 3D human pose estimation in the wild

Multi-view recovery

[ICCV2019], Nitin Saini et al., Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles
[ICCV2019], Haibo Qiu et al., Cross View Fusion for 3D Human Pose Estimation
[ICCV2019], Junbang Liang et al., Shape-Aware Human Pose and Shape Reconstruction Using Multi-View Images
[ICCV2019], Karim Iskakov et al., Learnable Triangulation of Human Pose
[CVPR2019], Junting Dong et al., Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views
[CVPR2020], Yuxiang Zhang et al., 4D Association Graph for Realtime Multi-Person Motion Capture Using Multiple Video Cameras
[ECCV2020] Junting Dong et al., Motion Capture from Internet Videos

Multi-person

[CVPR2020] Mihai Fieraru et al., Three-dimensional Reconstruction of Human Interactions
[(SIGGRAPH) 2020] Dushyant Mehta et al., XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera
[ICCV 2019] Gyeongsik Moon et al., Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image
[CVPR2020] Matteo Fabbri et al., Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
[CVPR2020] Abdallah Benzine et al., PandaNet : Anchor-Based Single-Shot Multi-Person 3D Pose Estimation
[CVPR2020] Wen Jiang et al., Coherent Reconstruction of Multiple Humans From a Single Image

Other input

[ICCV2019], Haiyong Jiang et al., Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds
[ICCV2019], Denis Tome et al., xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera

Detailed geometry

[ECCV2020] CLOTH3D: Clothed 3D Humans
[ECCV2020] Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction
[ECCV2020] Luyang Zhu et al., Reconstructing NBA Players
[CVPR2020] Shunsuke Saito et al., PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
[ICCV2019] Shunsuke Saito et al., PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization
[ICCV2019], Zerong Zheng et al., DeepHuman: 3D Human Reconstruction From a Single Image
[ICCV2019], Albert Pumarola et al., 3DPeople: Modeling the Geometry of Dressed Humans
[ICCV19] Thiemo Alldieck et al., Tex2Shape: Detailed Full Human Body Geometry From a Single Image
[ICCV19] Sicong Tang et al., A Neural Network for Detailed Human Depth Estimation From a Single Image
[CVPR19 Oral] Ryota Natsume et al., SiCloPe: Silhouette-Based Clothed People
[CVPR19 Oral] Nikos Kolotouros et al.，Convolutional Mesh Regression for Single-Image Human Shape Reconstruction
[CVPR19 Oral] Hao Zhu et al., Detailed Human Shape Estimation from a Single Image by Hierarchical mesh deformation
[CVPR19] Thiemo Alldieck et al., Learning to Reconstruct People in Clothing from a Single RGB Camera
[CVPR19] Tao Yu et al., SimulCap: Single-View Human Performance Capture with Cloth Simulation
[CVPR2020] Marc Habermann et al., DeepCap: Monocular Human Performance Capture Using Weak Supervision
[TOG 2019] Marc Habermann et al., LiveCap: Real-Time Human Performance Capture From Monocular Video
[ICCV2019] Valentin Gabeur et al., Moulding Humans: Non-parametric 3D Human Shape Estimation from Single Images
[CVPR2020] Shunsuke Saito et al., PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
[CVPR2020] Feitong Tan et al., Self-Supervised Human Depth Estimation from Monocular Videos
[CVPR2020] Zeng Huang et al., ARCH: Animatable Reconstruction of Clothed Humans
[CVPR2020] Hayato Onizuka et al., TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell
[CVPR2020] Yong-Lu Li et al., Detailed 2D-3D Joint Representation for Human-Object Interaction
[CVPR2019], Thiemo Alldieck et al., Learning to Reconstruct People in Clothing from a Single RGB Camera

Self-supervise/ weak supervise

[CVPR2019] Muhammed Kocabas et al., Self-Supervised Learning of 3D Human Pose using Multi-view Geometry
[CVPR2019], Bastian Wandt et al., RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation

Interaction with scene

[ICCV 2019] Mohamed Hassan et al., Resolving 3D Human Pose Ambiguities with 3D Scene Constraints
[ICCV 2019] Yixin Chen et al., Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense
[CVPR 2020] Yong-Lu Li et al., Detailed 2D-3D Joint Representation for Human-Object Interaction

Occlusion

[ICCV2019] Yu Cheng et al., Occlusion-Aware Networks for 3D Human Pose Estimation in Video
[CVPR2020] Object-Occluded Human Shape and Pose Estimation from a Single Color Image

Total capture

[CVPR2019] Donglai Xiang et al., Monocular Total Capture: Posing Face, Body and Hands in the Wild
[CVPR2019] Georgios Pavlakos et al., Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Future prediction

[ECCV2020] Long-term Human Motion Prediction with Scene Context
[ICCV2019] Zhi Li et al., On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos
[ICCV2019] A. Hernandez Ruiz et al., Human Motion Prediction via Spatio-Temporal Inpainting
[ICCV2019] Emre Aksan et al., Structured Prediction Helps 3D Human Motion Modelling
[ICCV2019] Wei Mao et al., Learning Trajectory Dependencies for Human Motion Prediction
[CVPR2019] Zhenguang Liu et al., Towards Natural and Accurate Future Motion Prediction of Humans and Animals
[CVPR2019] Anand Gopalakrishnan et al.， A Neural Temporal Model for Human Motion Prediction

Others

[ICCV19] Yu Rong et al., Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild
[CVPR19] Chung-Yi Weng et al., Photo Wake-Up: 3D Character Animation from a Single Phot
[CVPR2019] Rohit Pandey et al. Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning

3D-Hands

[CVPR2020]，Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild

memoiry/Monocular_3D_human

Paper list of monocular 3D human pose and shape estimation

Table of Contents

Single-view recovery

2D pose to 3D pose

Video

Multi-view recovery

Multi-person

Other input

Detailed geometry

Self-supervise/ weak supervise

Interaction with scene

Occlusion

Total capture

Future prediction

Others

3D-Hands