WS-GCN: Integrating GCN with Weak Supervision for Enhanced 3D Human Pose Estimation (ICCAI 2024)

This repository holds the Pytorch implementation of WS-GCN: Integrating GCN with Weak Supervision for Enhanced 3D Human Pose Estimation by Jiang Zhenxiang, Chen Yingyu.

Introduction

Precisely estimating the 3D human pose is an important and challenging goal in computer vision domain. This topic has improved significantly with the advent of deep learning paradigms and graph convolutional networks (GCNs), especially in the context of contextualizing and analyzing human motion from visual input. This study offers a novel strategy that combines the structural stability of GCNs with the robustness of weakly supervised learning techniques to address the fundamental challenges of 3D human pose estimation, which is closely related to the scarcity of large 3D in-the-wild datasets and the difficulties involved in modeling 3D spatial data. The novel WS-GCN model, a weakly-supervised semantic graph convolutional neural network, is the central component of our proposal. By building a semantic graph, this model is able to represent the complex semantic and spatial relationships between anatomical joints. With the integration of a non-local layer, WS-GCN greatly improves the accuracy of projecting 2D coordinates into 3D space. The effectiveness of our methodology is demonstrated by empirical evaluations, especially when combined with a bone length penalty and a fully-supervised training warm-up stage. In combination, these improvements strengthen the model's performance when it comes to weakly supervised domains. The model demonstrates quantitative performance under full supervision, with a Mean Per Joint Position Error (MPJPE) of 41.95 mm and a Procrustes-aligned MPJPE (P-MPJPE) of 33.40 mm. Under the weakly supervised conditions, the model attains the MPJPE of 49.23 mm and the P-MPJPE of 39.88 mm. Significantly, this model scored third in the overall domain of weakly supervised 3D human pose estimation on the Human3.6M dataset and attained SOTA in the single view, single frame weakly-supervised 3D human pose estimation on the dataset. This work not only represents an important step forward in the field of 3D human pose estimation, but it also lays out a foundation for further research and possible uses in related fields like virtual reality and interactive computing.

Results on Human3.6M

Compared results between full supervision and weak supervision.

Supervision	Bone Loss	Warmup	MPJPE	P-MPJPE
Full	\	\	41.95 mm	33.40 mm
Weak	No	Yes	53.16 mm	43.77 mm
Weak	Yes	Yes	49.23 mm	39.88 mm
Weak	Yes	No	63.50 mm	52.14 mm

Ranking information of single-frame single-view weakly-supervised 3D human pose estimation on Human3.6M. Datasource: paperwithcode: Weakly-supervised 3D Human Pose Estimation on Human3.6M

Network	Data View	Frame/Sequence	MPJPE
AdaptPose	Single	Sequence	42.5 mm
HDVR	Single	Sequence	47.3 mm
WS-GCN	Single	Frame	49.23 mm
PGNIS	Single	Frame	50.8 mm
MetaPose	Multi	Frame	56.0 mm
PoseAug	Single	Frame	56.7 mm

References

[1] Martinez et al. A simple yet effective baseline for 3d human pose estimation. ICCV 2017.

[2] Pavllo et al. 3D human pose estimation in video with temporal convolutions and semi-supervised training. CVPR 2019.

[3] Long Zhao et al. Semantic Graph Convolutional Networks for 3D Human Pose Regression. CVPR 2019.

Acknowledgement

Part of our code is borrowed from the following repositories.

We thank to the authors for releasing their codes. Please also consider citing their works.

RoyMikeJiang/WSGCN

WS-GCN: Integrating GCN with Weak Supervision for Enhanced 3D Human Pose Estimation (ICCAI 2024)

Introduction

Results on Human3.6M

References

Acknowledgement