Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video [CVPR 2022]

Our Motion Pose and Shape Network (MPS-Net) is to effectively capture humans in motion to estimate accurate and temporally coherent 3D human pose and shape from a video.

Pleaser refer to our arXiv report for further details.

Check our YouTube video below for 5 minute video presentation of our work.

Getting Started

Installation & Clone the repo [Environment on Linux (Ubuntu 18.04 with python >= 3.7)]

# Clone the repo:
git clone https://github.com/MPS-Net/MPS-Net_release.git

# Install the requirements using `virtualenv`: 
cd $PWD/MPS-Net_release
source scripts/install_pip.sh

Installation & Clone the repo [Windows + Anaconda + Git Bash]

# Download and installing Anaconda on Windows:  https://www.anaconda.com/products/distribution#windows

# Installing Git Bash: 
cmd
winget install --id Git.Git -e --source winget

# Launch Git Bash
start "" "%PROGRAMFILES%\Git\bin\sh.exe" --login

# Clone the repo:
git clone https://github.com/MPS-Net/MPS-Net_release.git

# Install the requirements using `conda`: 
cd MPS-Net_release
source scripts/install_conda.sh

Download the Required Data

You can just run:

source scripts/get_base_data.sh

You can download the required data and the pre-trained MPS-Net model from here. You need to unzip the contents and the data directory structure should follow the below hierarchy.

${ROOT}  
|-- data  
|   |-- base_data  
|   |-- preprocessed_data

Evaluation

Run the commands below to evaluate a pretrained model on 3DPW test set.

# dataset: 3dpw
python evaluate.py --dataset 3dpw --cfg ./configs/repr_table1_3dpw_model.yaml --gpu 0

You should be able to obtain the output below:

PA-MPJPE: 52.1, MPJPE: 84.3, MPVPE: 99.7, ACC-ERR: 7.4

Running the Demo

We have prepared a demo code to run MPS-Net on arbitrary videos. To do this you can just run:

python demo.py --vid_file sample_video.mp4 --gpu 0

sample_video.mp4 demo output:

python demo.py --vid_file sample_video2.mp4 --gpu 0

sample_video2.mp4 demo output:

Citation

@inproceedings{WeiLin2022mpsnet,
  title={Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video},
  author={Wei, Wen-Li and Lin, Jen-Chun and Liu, Tyng-Luh and Liao, Hong-Yuan Mark},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2022}
}

License

This project is licensed under the terms of the MIT license.

References

The base codes are largely borrowed from great resources VIBE and TCMR.

superchongcnn/MPS-Net_release