Method | MPJPE(mm) | FPS |
---|---|---|
PoseFormer | 44.3 | 1952 |
Anatomy3D | 44.1 | 429 |
P-STMO-S | 43.0 | 3504 |
P-STMO | 42.8 | 3040 |
Make sure you have the following dependencies installed:
- PyTorch >= 0.4.0
- NumPy
- Matplotlib=3.1.0
- FFmpeg (if you want to export MP4 videos)
- ImageMagick (if you want to export GIFs)
- Matlab
Our model is evaluated on Human3.6M and MPI-INF-3DHP datasets.
We set up the Human3.6M dataset in the same way as VideoPose3D. You can download the processed data from here. data_2d_h36m_gt.npz
is the ground truth of 2D keypoints. data_2d_h36m_cpn_ft_h36m_dbb.npz
is the 2D keypoints obatined by CPN. data_3d_h36m.npz
is the ground truth of 3D human joints. Put them in the ./dataset
directory.
We set up the MPI-INF-3DHP dataset by ourselves. We convert the original data in .mat
format to the processed data in .npz
format by using data_to_npz_3dhp.py
and data_to_npz_3dhp_test.py
. You can download the processed data from here. Put them in the ./dataset
directory.
You can download our pre-trained models from here. Put them in the ./checkpoint
directory.
To evaluate our P-STMO-S model on the ground truth of 2D keypoints, please run:
python run.py -k gt -f 243 -tds 2 --reload 1 --previous_dir checkpoint/PSTMOS_no_refine_15_2936_h36m_gt.pth
The following models are trained using the 2D keypoints obtained by CPN as inputs.
To evaluate our P-STMO-S model, please run:
python run.py -f 243 -tds 2 --reload 1 --previous_dir checkpoint/PSTMOS_no_refine_28_4306_h36m_cpn.pth
To evaluate our P-STMO model, please run:
python run.py -f 243 -tds 2 --reload 1 --layers 4 --previous_dir checkpoint/PSTMO_no_refine_11_4288_h36m_cpn.pth
To evaluate our P-STMO model using the refine module proposed in ST-GCN, please run:
python run.py -f 243 -tds 2 --reload 1 --refine_reload 1 --refine --layers 4 --previous_dir checkpoint/PSTMO_no_refine_6_4215_h36m_cpn.pth --previous_refine_name checkpoint/PSTMO_refine_6_4215_h36m_cpn.pth
To evaluate our P-STMO-S model on MPI-INF-3DHP dataset, please run:
python run_3dhp.py -f 81 --reload 1 --previous_dir checkpoint/PSTMOS_no_refine_50_3203_3dhp.pth
After that, the 3D pose predictions are saved as checkpoint/inference_data.mat
. These results can be evaluated using Matlab by running 3dhp_test/test_util/mpii_test_predictions_py.m
. The final evaluation results are obtained by averaging sequencewise evaluation results over the number of frames.
For the pre-training stage, our model aims to solve the masked pose modeling task. Please run:
python run.py -f 243 -b 160 --MAE --train 1 --layers 3 -tds 2 -tmr 0.8 -smn 2 --lr 0.0001 -lrd 0.97
Different models use different configurations as follows.
Model | -k | --layers | -tmr | -smn |
---|---|---|---|---|
P-STMO-S (GT) | gt | 3 | 0.8 | 7 |
P-STMO-S | default | 3 | 0.8 | 2 |
P-STMO | default | 4 | 0.6 | 3 |
For the fine-tuning stage, the pre-trained encoder is loaded to our STMO model and fine-tuned. Please run:
python run.py -f 243 -b 160 --train 1 --layers 3 -tds 2 --lr 0.0007 -lrd 0.97 --MAE_reload 1 --previous_dir your_best_model_in_stage_I.pth
Different models use different configurations as follows.
Model | -k | --layers | --lr |
---|---|---|---|
P-STMO-S (GT) | gt | 3 | 0.001 |
P-STMO-S | default | 3 | 0.0007 |
P-STMO | default | 4 | 0.001 |
We only train and evaluate our P-STMO-S model on MPI-INF-3DHP dataset using the ground truth of 2D keypoints as inputs.
For the pre-training stage, please run:
python run_3dhp.py -f 81 -b 160 --MAE --train 1 --layers 3 -tmr 0.7 -smn 2 --lr 0.0001 -lrd 0.97
For the fine-tuning stage, please run:
python run_3dhp.py -f 81 -b 160 --train 1 --layers 3 --lr 0.0007 -lrd 0.97 --MAE_reload 1 --previous_dir your_best_model_in_stage_I.pth
To test our model on custom videos, you can use an off-the-shelf 2D keypoint detector (such as AlphaPose) to yield 2D poses from images and use our model to yield 3D poses. The 2D keypoint detectors are trained on COCO dataset, which defines the order of human joints in a different way from Human3.6M. Thus, our model needs to be re-trained to be compatible with the existing detectors. Our model takes 2D keypoints in COCO format, which can be downloaded from here, as inputs and outputs 3D joint positions in Human3.6M format.
You can use our pre-trained model PSTMOS_no_refine_48_5137_in_the_wild.pth
or train our model from scratch using the following commands.
For the pre-training stage, please run:
python run_in_the_wild.py -k detectron_pt_coco -f 243 -b 160 --MAE --train 1 --layers 3 -tds 2 -tmr 0.8 -smn 2 --lr 0.0001 -lrd 0.97
For the fine-tuning stage, please run:
python run_in_the_wild.py -k detectron_pt_coco -f 243 -b 160 --train 1 --layers 3 -tds 2 --lr 0.0007 -lrd 0.97 --MAE_reload 1 --previous_dir your_best_model_in_stage_I.pth
After that, you can evaluate our models on in-the-wild videos using this repo. Please follow the below instructions.
- Follow their
README.md
to set up the code. - Put the checkpoint in the
checkpoint/
folder of their repo. - Put the
model/
folder andin_the_wild/videopose_PSTMO.py
in the root path of their repo. - Put
in_the_wild/arguments.py
,in_the_wild/generators.py
, andin_the_wild/inference_3d.py
in thecommon/
folder of their repo. - Run
videopose_PSTMO.py
!
Note that the frame rate of Human3.6M dataset is 50 fps, while most of the videos are at 25 or 30 fps. So we set tds=2
during training and tds=1
during testing.
If you find this repo useful, please consider citing our paper:
@article{shan2022p,
title={P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation},
author={Shan, Wenkang and Liu, Zhenhua and Zhang, Xinfeng and Wang, Shanshe and Ma, Siwei and Gao, Wen},
journal={arXiv preprint arXiv:2203.07628},
year={2022}
}
Our code refers to the following repositories.
We thank the authors for releasing their codes.