ygx7/FMFormer

The Feature-padded Multi-scale Transformer for Monocular 3D Human Pose Estimation

Python

FMFormer: Frame-padded Multi-scale Transformer for Monocular 3D Human Pose Estimation

Our paper has been published in IEEE Transactions on Multimedia: Frame-Padded Multiscale Transformer for Monocular 3D Human Pose Estimation.

This work is based on the VideoPose3D and MixSTE, and you can get more help there.

Test on Human3.6M

Environment

The code is conducted under the following environment:

Ubuntu 20.04
Python 3.9.16
PyTorch 1.13.1
CUDA 11.7

Dataset

The dataset setting follow the VideoPose3D. Please refer to it to set up the Human3.6M dataset (under ./data directory).

Evaluation

Download our pretrained model from Google Drive;

Then run the command below (evaluate on 243 frames input):

python run.py -k cpn_ft_h36m_dbb -c <checkpoint_path> --evaluate <checkpoint_file> -f 243 -s 243 --edgepad 81

Training from scratch

Training FMFormer with GPUs:

python run.py -k cpn_ft_h36m_dbb -f 243 -s 243 --edgepad 81 -l log/run -c checkpoint -gpu 0,1

Acknowledgement

Thanks for the baselines, we construct the code based on them:

VideoPose3D
MixSTE