/SMPLer

"SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation", TPAMI 2024

Primary LanguagePython

SMPLer Logo

🔥 SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation 🔥

Xiangyu Xu,    Lijuan Liu,    Shuicheng Yan

arXiv     Xi'an Jiaotong University

Watch the video

Abstact

Existing Transformers for monocular 3D human shape and pose estimation typically have a quadratic computation and memory complexity with respect to the feature length, which hinders the exploitation of fine-grained information in high-resolution features that is beneficial for accurate reconstruction. In this work, we propose an SMPL-based Transformer framework (SMPLer) to address this issue. SMPLer incorporates two key ingredients: a decoupled attention operation and an SMPL-based target representation, which allow effective utilization of high-resolution features in the Transformer. In addition, based on these two designs, we also introduce several novel modules including a multi-scale attention and a joint-aware attention to further boost the reconstruction performance. Extensive experiments demonstrate the effectiveness of SMPLer against existing 3D human shape and pose estimation methods both quantitatively and qualitatively. Notably, the proposed algorithm achieves an MPJPE of 45.2mm on the Human3.6M dataset, improving upon Mesh Graphormer by more than 10% with fewer than one-third of the parameters.

Highlights

  • Decoupled attention design
  • SMPL-based target representation
  • Significantly reduced computation and parameter overhead
  • Exploitation of high-resolution features
  • Multi-scale attention moduel and joint-aware attention module
  • MPJPE of 45.2 on Human3.6M surpassing SOTA by over 10%

Results

Method Parameters (M) MPJPE ↓ (Human3.6M) PA-MPJPE ↓ (Human3.6M) MPVE ↓ (3DPW) MPJPE ↓ (3DPW) PA-MPJPE ↓ (3DPW)
METRO 231.8 54.0 36.7 88.2 77.1 47.9
Mesh Graphormer 215.7 51.2 34.5 87.7 74.7 45.6
SMPLer 35.6 47.0 32.8 84.7 75.7 45.2
SMPLer-L 70.2 45.2 32.4 82.0 73.7 43.4


1. Install

  • Hardware requirements

    For Testing: Most modern GPUs are adequate.

    For Training: It is recommended to use 2 NVIDIA A100 GPUs.

  • Create conda environment

    conda create -n smpler python=3.8
    conda activate smpler
  • Install packages

    pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
    
    pip install scipy==1.5.0 scikit-image==0.19.1 opencv-python==4.5.4.58 imageio matplotlib numpy==1.20.3 chumpy==0.70 ipython ipykernel ipdb smplx==0.1.28 tensorboardx==2.4 tensorboard==2.7.0 easydict pillow==8.4.0
  • Install Pytorch3D

    conda install -c fvcore -c iopath -c conda-forge fvcore iopath
    conda install -c bottler nvidiacub
    
    wget https://anaconda.org/pytorch3d/pytorch3d/0.5.0/download/linux-64/pytorch3d-0.5.0-py38_cu111_pyt180.tar.bz2 --no-check-certificate
    
    conda install pytorch3d-0.5.0-py38_cu111_pyt180.tar.bz2
    
    rm pytorch3d-0.5.0-py38_cu111_pyt180.tar.bz2

2. How to Run

  • Download meta data and extract it into "PATH_to_SMPLer/meta_data"

  • Download pretrained models and extract it into "PATH_to_SMPLer/pretrained"

  • Run demo

    python demo.py --img_path samples/im01.png

3. Train and Test

3.1 Prepare Datasets

There are two ways to download the datasets: azcopy and wget.

Recommended way: azcopy (faster)

  • Download azcopy from here

  • Download datasets with azcopy:

    cd PATH_to_STORE_DATASET
    azcopy_path=PATH_to_AZCOPY
    bash PATH_to_SMPLer/scripts/download_datasets_azcopy.sh
  • Create a symbolic link:

    cd PATH_to_SMPLer
    ln -s PATH_to_STORE_DATASET ./datasets

Alternative way: wget (usually slower and less stable, but no dependency on azcopy)

  • Download datasets with wget:
    cd PATH_to_STORE_DATASET
    bash PATH_to_SMPLer/scripts/download_datasets_wget.sh

3.2 Test

  • Test on H36M dataset (models are trained on mixed data: Human3.6M, COCO, MuCo, UP-3D, and MPII)

    # for SMPLer
    python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --eval_only --val_batch_size=128 --model_type=smpler --data_mode=h36m --hrnet_type=w32 --load_checkpoint=pretrained/SMPLer_h36m.pt 
    
    # for SMPLer-L 
    python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --eval_only --val_batch_size=128 --model_type=smpler --data_mode=h36m --hrnet_type=w48 --load_checkpoint=pretrained/SMPLer-L_h36m.pt 
  • Test on 3DPW dataset (models are finetuned on 3DPW)

    # for SMPLer
    python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --eval_only --val_batch_size=128 --model_type=smpler --data_mode=3dpw --hrnet_type=w32 --load_checkpoint=pretrained/SMPLer_3dpw.pt 
    
    # for SMPLer-L 
    python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --eval_only --val_batch_size=128 --model_type=smpler --data_mode=3dpw --hrnet_type=w48 --load_checkpoint=pretrained/SMPLer-L_3dpw.pt 

3.3 Train

There are three stages for training.

  • For SMPLer:

    # 1. Train CNN backbone on mixed data
    python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --exp_name=backbone --batch_size=100 --num_workers=8 --lr=2e-4 --data_mode=h36m --model_type=backbone --num_epochs=50 --hrnet_type=w32  
    
    # 2. Train SMPLer on mixed data
    python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --exp_name=smpler --batch_size=100 --num_workers=8 --lr=2e-4 --data_mode=h36m --model_type=smpler --num_epochs=100 --hrnet_type=w32 --load_checkpoint=logs/backbone/checkpoints/epoch_049.pt
    
    # 3. Finetune SMPLer on 3DPW
    python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --exp_name=smpler_3dpw --batch_size=32 --num_workers=8 --lr=1e-4 --data_mode=3dpw --model_type=smpler --num_epochs=2 --hrnet_type=w32 --load_checkpoint=logs/smpler/checkpoints/epoch_***.pt --summary_steps=100
  • For SMPLer-L:

    # 1. Train CNN backbone on mixed data
    python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --exp_name=backbone-L --batch_size=100 --num_workers=8 --lr=2e-4 --data_mode=h36m --model_type=backbone --num_epochs=50 --hrnet_type=w48  
    
    # 2. Train SMPLer-L on mixed data
    python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --exp_name=smpler-L --batch_size=100 --num_workers=8 --lr=2e-4 --data_mode=h36m --model_type=smpler --num_epochs=100 --hrnet_type=w48 --load_checkpoint=logs/backbone-L/checkpoints/epoch_049.pt
    
    # 3. Finetune SMPLer-L on 3DPW
    python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --exp_name=smpler-L_3dpw --batch_size=32 --num_workers=8 --lr=1e-4 --data_mode=3dpw --model_type=smpler --num_epochs=2 --hrnet_type=w48 --load_checkpoint=logs/smpler-L/checkpoints/epoch_***.pt --summary_steps=100

4. BibTex

For citing SMPLer in your work,

@article{xu2024smpler,
  title={SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation},
  author={Xu, Xiangyu and Liu, Lijuan and Yan, Shuicheng},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024}
}

5. Related Resources

Please explore these resources to broaden your understanding of 3D human modeling: METRO, Mesh Graphormer, RSC-Net, Texformer, Sewformer, GP-NeRF