Progressively-Generating-Better-Initial-Guesses-Towards-Next-Stages-forHigh-Quality-Human-Motion-Prediction

Official implementation of Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction (CVPR 2022 paper)

[PDF] [Supp] [Demo]

Authors

Tiezheng Ma, School of Computer Science and Engineering, South China University of Technology, China, mtz705062791@gmail.com
Yongwei Nie, School of Computer Science and Engineering, South China University of Technology, China, nieyongwei@scut.edu.cn
Chengjiang Long, Meta Reality Labs, USA, clong1@fb.com
Qing Zhang, School of Computer Science and Engineering, Sun Yat-sen University, China, zhangqing.whu.cs@gmail.com
Guiqing Li, School of Computer Science and Engineering, South China University of Technology, China, ligq@scut.edu.cn

Abstract

This paper presents a high-quality human motion prediction method that accurately predicts future human poses given observed ones. Our method is mainly based on the observation that a good initial guess of the future pose sequence, such as the mean of future poses, is very helpful to improve the forecasting accuracy. This motivates us to design a novel two-stage prediction strategy, including an init-prediction network that just computes a good initial guess and a formal-prediction network that takes both the historical and initial poses to predict the target pose sequence. We extend this idea further and design a multi-stage prediction framework with each stage predicting initial guess for the next stage, which rewards us with significant performance gain. To fulfill the prediction task at each stage, we propose a network comprising Spatial Dense Graph Convolutional Networks (S-DGCN) and Temporal Dense Graph Convolutional Networks (T-DGCN). Sequentially executing the two networks can extract spatiotemporal features over the global receptive field of the whole pose sequence effectively. All the above design choices cooperating together make our method outperform previous approaches by a large margin (6%-7% on Human3.6M, 5%-10% on CMU-MoCap, 13%-16% on 3DPW).

Overview

Dependencies

Pytorch 1.8.0+cu11
Python 3.7
Nvidia RTX 2060

DataSet

Human3.6m in exponential map can be downloaded from here.

CMU mocap was obtained from the repo of ConvSeq2Seq paper.

3DPW from their official website.

Train

Train on Human3.6M:

python main_h36m.py --data_dir [dataset path] --dct_n 35 --input_n 10 --output_n 25 --skip_rate 1 --batch_size 16 --test_batch_size 32 --in_features 66 --cuda_idx cuda:0 --d_model 16 --lr_now 0.005 --epoch 50 --test_sample_num -1

Train on CMU-MoCap:

python main_cmu_3d.py --data_dir [dataset path] --dct_n 35 --input_n 10 --output_n 25 --skip_rate 1 --batch_size 16 --test_batch_size 32 --in_features 75 --cuda_idx cuda:0 --d_model 16 --lr_now 0.005 --epoch 50 --test_sample_num -1

Train on 3DPW:

--data_dir [dataset path] --dct_n 40 --input_n 10 --output_n 30 --skip_rate 1 --batch_size 32 --test_batch_size 32 --in_features 69 --cuda_idx cuda:0 --d_model 16 --lr_now 0.005 --epoch 50 --test_sample_num -1

Note:

d_model: is the latent code dimension of a joint.
test_sample_num: is the sample number for test dataset, can be set as {8, 256, -1(all)}. For example, if it is set to 8, it means that 8 samples are sampled for each action as the test set.

After training, the checkpoint is saved in ./checkpoint/.

Test

Add --is_eval after the above training commands.

The test result will be saved in ./checkpoint/.

Citation

If you think our work is helpful to you, please cite our paper.

Ma T, Nie Y, Long C, et al. Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 6437-6446.

Acknowledgments

Our code is based on HisRep and LearnTrajDep

Licence

MIT

saurabh1002/PGBIG