DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model

Environment

This code is built on the following environment

You can create and activate conda environment as the following:

conda env create -f requirements.yaml
conda activate diffupose

You can set up Human3.6M and HumanEva-I datasets following the link in the Videopose3D.

Alternatively, you can directly download data from Google Drive link.

You can download them and unzip the .zip file in the ./data folder.

Make sure that your repository end up with './data/data_3d_h36m.npz', './data/data_2d_h36m_hr.npz', './data/data_2d_h36m_gt.npz'.

If you want to train our model from scratch using HR-Net detection, please run

python run.py -k hr -b 1024

Else, if you want to train with 2D ground-truth, please run

python run.py -k gt -b 1024

We provide our pre-trained 384-dimension model in results folder (HR-Net detected 2D pose as input). To evaluate our model, please run

python run.py -k hr --test-load best_model.pt

which will result in 50.0 mm error (MPJPE).

To obatin best result with 10 samples, run

python run.py -k hr --test-load best_model.pt --num-sample 10

which will result in 49.4 mm error (MPJPE).

Our code is compatiable with VideoPose3D. Please refer to their github page for detailed instruction.