/diffhpe

Official code of "DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion"

Primary LanguagePythonOtherNOASSERTION

DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion

Cédric Rommel, Eduardo Valle, Mickaël Chen, Souhaiel Khalfaoui, Renaud Marlet, Matthieu Cord, Patrick Pérez

International Conference on Computer Vision (ICCV),
Workshop Analysis and Modeling of Faces and Gestures, 2023

[arXiv] [webpage]

Abstract

We present an innovative approach to 3D Human Pose Estimation (3D-HPE) by integrating cutting-edge diffusion models, which have revolutionized diverse fields, but are relatively unexplored in 3D-HPE. We show that diffusion models enhance the accuracy, robustness, and coherence of human pose estimations. We introduce DiffHPE, a novel strategy for harnessing diffusion models in 3D-HPE, and demonstrate its ability to refine standard supervised 3D-HPE. We also show how diffusion models lead to more robust estimations in the face of occlusions, and improve the time-coherence and the sagittal symmetry of predictions. Using the Human,3.6M dataset, we illustrate the effectiveness of our approach and its superiority over existing models, even under adverse situations where the occlusion patterns in training do not match those in inference. Our findings indicate that while standalone diffusion models provide commendable performance, their accuracy is even better in combination with supervised models, opening exciting new avenues for 3D-HPE research.


Getting started

Requirements

The code requires Python 3.7 or later. The file requirements.txt contains the full list of required Python modules.

pip install -r requirements.txt

You may also optionally install MLFlow for experiment tracking:

pip install mlflow

Data

The Human3.6M dataset was set following the AnyGCN repository. Please refer to it to set it up.

Consider adding the path to where the data is stored to the data.data_dir field in the conf/config.yaml file. Alternatively, this information can also be passed directly to the training/test command line if preferred, as explained below.

Checkpoints

You can download checkpoints of pretrained models from the assets of the last code release, and put them inside pre-trained-models in subfolders diff_model_ckpts (for DiffHPE-2D and DiffHPE-Wrapper checkpoints) and conditioners_ckpts (for all others).

Evaluation

Both pre-trained DiffHPE-2D and DiffHPE-Wrapper checkpoints are available in pre-trained-models/diff_model_ckpts folder and can be evaluated.

Just run the command below (evaluate on 27 frames input) for the DiffHPE-Wrapper for example:

python main_h36m_lifting.py run.mode=test data.data_dir=/PATH/TO/H36M/DATA/ eval.model_l=pre-trained-models/diff_model_ckpts/diffhpe-wrapper

Note that you can omit the data.data_dir part of the command if you filled the corresponding field in conf/config.yaml beforehand.

To evaluate DiffHPE-2D, just change the path passed to the eval.model_l as follows:

python main_h36m_lifting.py run.mode=test data.data_dir=/PATH/TO/H36M/DATA/ eval.model_l=pre-trained-models/diff_model_ckpts/diffhpe-2d

Visualization

Given a pre-trained model checkpoint, uou can visualize the predicted poses using the script viz.py. For example:

python viz.py data.data_dir=/PATH/TO/H36M/DATA/ eval.model_l=pre-trained-models/diff_model_ckpts/diffhpe-wrapper viz.viz_limit=600

The visualization configuration can be changed within the viz field, in conf/config.yaml.

Training

DiffHPE-2D model

To train DiffHPE-2D from scratch, run:

python main_h36m_lifting.py data.data_dir=/PATH/TO/H36M/DATA/ +train=diffhpe-2d +diffusion=diffhpe-2d

DiffHPE-Wrapper model

Likewise, you train DiffHPE-Wrapper from scratch with this command:

python main_h36m_lifting.py data.data_dir=/PATH/TO/H36M/DATA/

Training with differenrt occlusions

The previous commands will train the diffusion models with standard data. If you want to train with simulated occlusions, you can choose a different data config from conf/data. For example, to train a DiffHPE-2D model with consecutive frames occlusion, run:

python main_h36m_lifting.py +data=lifting_cpn17_test_seq27_frame_miss data.data_dir=/PATH/TO/H36M/DATA/ +train=diffhpe-2d +diffusion=diffhpe-2d

Note that, in the case of DiffHPE-Wrapper, you also need to change the checkpoint of the pre-trained conditionner model to on which was trained with the same types of occlusion:

python main_h36m_lifting.py +data=lifting_cpn17_test_seq27_frame_miss data.data_dir=/PATH/TO/H36M/DATA/ diffusion.cond_ckpt=pre-trained-models/conditioners_ckpts/prt_mixste_h36m_L27_C64_structured_frame_miss.pt

MixSTE baseline

This codebase can also be used to retrain the supervised MixSTE baseline (without training tricks):

python main_h36m_lifting.py data.data_dir=/PATH/TO/H36M/DATA/ +train=sup_mixste_seq27 +diffusion=sup_mixste_seq27

Acknowledgments

Great part of this diffusion code was copied and modified from A generic diffusion-based approach for 3D human pose prediction in the wild, which is also heavily inspired by CSDI.

Human pose lifting, as well as GCN-related code was borrowed from AnyGCN, which builds on top of several other repositories, including:

The baseline model MixSTE was modified from its official paper repository.

Citation

@INPROCEEDINGS{rommel2023diffhpe,
  title={DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion},
  author={Rommel, C{\'e}dric and Valle, Eduardo and Chen, Micka{\"e}l and Khalfaoui, Souhaiel and Marlet, Renaud and Cord, Matthieu and P{\'e}rez, Patrick},
  booktitle={International Conference on Computer Vision Workshops (ICCVW)},
  year  = {2023}
}