Our paper has been published in IEEE Transactions on Multimedia: Frame-Padded Multiscale Transformer for Monocular 3D Human Pose Estimation.
This work is based on the VideoPose3D and MixSTE, and you can get more help there.
Test on Human3.6M
The code is conducted under the following environment:
- Ubuntu 20.04
- Python 3.9.16
- PyTorch 1.13.1
- CUDA 11.7
The dataset setting follow the VideoPose3D. Please refer to it to set up the Human3.6M dataset (under ./data directory).
- Download our pretrained model from Google Drive;
Then run the command below (evaluate on 243 frames input):
python run.py -k cpn_ft_h36m_dbb -c <checkpoint_path> --evaluate <checkpoint_file> -f 243 -s 243 --edgepad 81
Training FMFormer with GPUs:
python run.py -k cpn_ft_h36m_dbb -f 243 -s 243 --edgepad 81 -l log/run -c checkpoint -gpu 0,1
Thanks for the baselines, we construct the code based on them:
- VideoPose3D
- MixSTE