GLoT: Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation (CVPR2023)

Introduction

This repository is the official Pytorch implementation of Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation.

The base codes are largely borrowed from VIBE and TCMR.

See our paper for more details.

Results

Here I report the performance of GLoT.

Running GLoT

Installation

conda create -n glot python=3.7 -y
pip install torch==1.4.0 torchvision==0.5.0
pip install -r requirements.txt

Data preparation

Download base_data and SMPL pkl (male&female and neutral), and then put them into ${ROOT}/data/base_data/. Rename SMPL pkl as SMPL_{GENDER}.pkl format. For example, mv basicModel_neutral_lbs_10_207_0_v1.0.0.pkl SMPL_NEUTRAL.pkl.
Download data provided by TCMR (except InstaVariety dataset). Pre-processed InstaVariety is uploaded by VIBE authors here. Put them into ${ROOT}/data/preprocessed_data/
Download models for testing. Put them into ${ROOT}/data/pretrained_models/
Download images (e.g., 3DPW) for rendering. Put them into ${ROOT}/data/3dpw/

The data directory structure should follow the below hierarchy.

${ROOT}  
|-- data  
  |-- base_data  
    |-- J_regressor_extra.npy  
    |-- ...
  |-- preprocessed_data
    |-- 3dpw_train_db.pt
    |-- ...
  |-- pretrained_models
    |-- table1_3dpw_weights.pth.tar
    |-- ...
  |-- 3dpw
    |-- imageFiles
      |-- courtyard_arguing_00
      |-- ...

Evaluation

Run the evaluation code with a corresponding config file to reproduce the performance in the tables of our paper.

# Table1 3dpw
python evaluate.py --dataset 3dpw --cfg ./configs/repr_table1_3dpw.yaml --gpu 0 
# Table1 h36m
python evaluate.py --dataset h36m --cfg ./configs/repr_table1_h36m_mpii3d.yaml --gpu 0
# Table1 mpii3d
python evaluate.py --dataset mpii3d --cfg ./configs/repr_table1_h36m_mpii3d.yaml --gpu 0

# Table2 3dpw
python evaluate.py --dataset 3dpw --cfg ./configs/repr_table2_3dpw.yaml --gpu 0 

# for rendering 
python evaluate.py --dataset 3dpw --cfg ./configs/repr_table1_3dpw.yaml --gpu 0 --render

Reproduction (Training)

Run the training code with a corresponding config file to reproduce the performance in the tables of our paper.

# Table1 3dpw
python train_cosine_trans.py --cfg ./configs/repr_table1_3dpw.yaml --gpu 0 

# Table1 h36m & mpii3d
python train_cosine_trans.py --cfg ./configs/repr_table1_h36m_mpii3d.yaml --gpu 0 

# Table2 3dpw
python train_cosine_trans.py --cfg ./configs/repr_table2_3dpw.yaml --gpu 0

After the training, change the config file's TRAIN.PRETRAINED with the checkpoint path (either checkpoint.pth.tar or model_best.pth.tar) and follow the evaluation command.

Quick demo

Download your videos, and run the following command.

python demo.py --vid_file demo.mp4 --gpu 0 --cfg ./configs/repr_table1_3dpw.yaml

The results will be saved in ./demo_output/demo/

Reference

@inproceedings{shen2023global,
  title={Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation},
  author={Shen, Xiaolong and Yang, Zongxin and Wang, Xiaohan and Ma, Jianxin and Zhou, Chang and Yang, Yi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={8887--8896},
  year={2023}
}

License

This project is licensed under the terms of the MIT license.

sxl142/GLoT