A list of the awesome egocentric human body pose estimation works and related resources. While some repositories awesome-egocentric-vision compile studies across the wide field of egocentric vision, none specifically focus on the niche area of egocentric human body pose estimation.
We split this topic by different capture setups:
- Egocentric Inside-In Pose Estimation
- Egocentric Inside-Out Pose Estimation
- IMU-Based Egocentric Pose Estimation
- Headset-Based Egocentric Pose Estimation
- Third-Person View Egocentric Pose Estimation
- Mixed Setup
The inside-in vision setup involves cameras or sensors directed toward the person or object of interest, capturing data from the inside of the motion capture subject. This setup can be seen on the Oculus Quest2 and Apple Vision Pro.
Setup | Dataset | Number of Frames | Synthetic or Real | Actor Number | Scene Annotation | FPS | Link |
---|---|---|---|---|---|---|---|
Monocular Fisheye | Mo2Cap2[2019-2] | 530K | Synthetic | - | No | - | Link |
xR-egopose[2019-3] | 252K Train + 16 Val | Synthetic | 34 | No | 30 | Link | |
EgoPW[2022-1] | 318K | Real (pseudo gt) | 10 | No | 25 | Link | |
EgoPW-Scene[2023-1] | 92K | Real (pseudo gt) | 10 | Pseudo Annotations | 25 | Link | |
EgoWholeBody[2023-5] | 700K | Synthetic | 14 | No | 30 | - | |
Stereo Perspecive |
EgoGlass[2021-3] | 172K | Real | 10 | No | 30 | - |
Stereo Fisheye |
UnrealEgo[2022-2] | 450K * 2 views | Synthetic | 17 | No | 25 | Link |
UnrealEgo2[2024-2] | 1.25M * 2 views | Synthetic | 17 | Yes | 25 | - |
Setup | Dataset | Number of Frames | Synthetic or Real | Scene Annotation | FPS | Dataset Link | Leader Board |
---|---|---|---|---|---|---|---|
Monocular Fisheye | Mo2Cap2[2019-2] | 5K | Real | No | 25 | Link | - |
xR-egopose[2019-3] | 115K | Synthetic | No | 30 | Link | - | |
GlobalEgoMocap[2021-2] | 318K | Real | No | 25 | Link | Paper With Code | |
SceneEgo[2023-1] | 28K | Real | Yes | 25 | Link | Paper With Code | |
EgoWholeBody[2023-5] | 133K | Synthetic | No | 30 | - | - | |
Stereo Fisheye |
UnrealEgo[2022-2] | 48K * 2 views | Synthetic | No | 25 | Link | Paper With Code |
UnrealEgo2[2024-2] | 123K * 2 views | Synthetic | Yes | 25 | - | - | |
UnrealEgo2-RW[2024-2] | 130K * 2 views | Real | Yes | 25 | - | - |
- Rhodin, Helge, et al. "Egocap: egocentric marker-less motion capture with two fisheye cameras." ACM Transactions on Graphics (TOG) 35.6 (2016): 1-11. [project page]
- Xu, Weipeng, et al. "Mo2cap2: Real-time mobile 3d motion capture with a cap-mounted fisheye camera." IEEE transactions on visualization and computer graphics 25.5 (2019): 2093-2101. [project page]
- Tome, Denis, et al. "xr-egopose: Egocentric 3d human pose from an hmd camera." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. [dataset]
- Zhang, Yahui, Shaodi You, and Theo Gevers. "Automatic calibration of the fisheye camera for egocentric 3d human pose estimation from a single image." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021.
- Wang, Jian, et al. "Estimating egocentric 3d human pose in global space." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. [project page] [dataset] [demo]
- Zhao, Dongxu, et al. "Egoglass: Egocentric-view human pose estimation from an eyeglass frame." 2021 International Conference on 3D Vision (3DV). IEEE, 2021.
- Wang, Jian, et al. "Estimating egocentric 3d human pose in the wild with external weak supervision." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. [project page] [dataset] [demo]
- Akada, Hiroyasu, et al. "UnrealEgo: A new dataset for robust egocentric 3d human motion capture." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022. [project page] [code] [dataset] [demo]
- Park, Jinman, et al. "Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation." arXiv preprint arXiv:2206.04785 (2022).
- Liu, Yuxuan, et al. "Ego+ X: An Egocentric Vision System for Global 3D Human Pose Estimation and Social Interaction Characterization." 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022.
- Wang, Jian, et al. "Scene-aware Egocentric 3D Human Pose Estimation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. [project page] [dataset] [code]
- Liu, Yuxuan, et al. "EgoHMR: Egocentric Human Mesh Recovery via Hierarchical Latent Diffusion Model." 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023.
- Liu, Yuxuan, et al. "EgoFish3D: Egocentric 3D Pose Estimation from a Fisheye Camera via Self-Supervised Learning." IEEE Transactions on Multimedia (2023).
- Kang, Taeho, et al. "Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views." SIGGRAPH Asia 2023 Conference Papers. 2023.
- Wang, Jian, et al. "Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement." arXiv preprint arXiv:2311.16495 (2023).
- Cuevas-Velasquez, Hanz, et al. "SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras." arXiv preprint arXiv:2401.14785 (2024).
- Akada, Hiroyasu, et al. "3D Human Pose Perception from Egocentric Stereo Videos." arXiv preprint arXiv:2401.00889 (2024).
The inside-out vision setup employs cameras or sensors positioned on the person or device, looking outward to the environment. This approach is commonly used in most virtual reality (VR) headsets and augmented reality (AR) systems, where cameras attached to the headset capture the user's surroundings and interpret motion relative to them.
-
Ego-Body Pose Estimation via Ego-Head Pose Estimation - Jiaman Li · Karen Liu · Jiajun Wu. In CVPR 2023.
-
You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions - Evonne Ng, Donglai Xiang, Hanbyul Joo, and Kristen Grauman. In CVPR 2020. [demo] [project page] [dataset] [code]
-
Ego-Pose Estimation and Forecasting as Real-Time PD Control - Ye Yuan and Kris Kitani. In ICCV 2019. [code] [project page] [demo]
-
Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video - Hao Jiang and Kristen Grauman. In CVPR 2017.
The Inertial Measurement Unit (IMU) setup utilizes sensors typically composed of accelerometers, gyroscopes, and sometimes magnetometers. In egocentric motion capture, IMUs are placed on the human body to capture dynamic motion and limb orientation changes.
Some methods use the headset 6dof pose (head pose) and VR controller 6dof pose (hand pose) to estimate full body pose. The hand and head poses come from the headset SLAM and VR controller, the input signal is much less noisy than the IMU setup.
The third-person setup refers to motion capture techniques that involve a third person carrying moving cameras observing the motion capture subject.
- EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices - Siwei Zhang, Qianli Ma, Yan Zhang, Zhiyin Qian, Taein Kwon, Marc Pollefeys, Federica Bogo, Siyu Tang. In ECCV 2022. [project page] [dataset] [code]
Combination of aforementioned setups.