Learning Temporal Pose Estimation from Sparsely Labeled Videos (NeurIPS 2019)

Introduction

This is an official pytorch implementation of Learning Temporal Pose Estimation from Sparsely Labeled Videos. In this work, we introduce a framework that reduces the need for densely labeled video data, while producing strong pose detection performance. Our approach is useful even when training videos are densely labeled, which we demonstrate by obtaining state-of-the-art pose detection results on PoseTrack17 and PoseTrack18 datasets. Our method, called PoseWarper, is currently ranked first for multi-frame person pose estimation on PoseTrack leaderboard.

Results on the PoseTrack Dataset

Temporal Pose Aggregation during Inference

Method	Dataset Split	Head	Shoulder	Elbow	Wrist	Hip	Knee	Ankle	Mean
PoseWarper	val17	81.4	88.3	83.9	78.0	82.4	80.5	73.6	81.2
PoseWarper	test17	79.5	84.3	80.1	75.8	77.6	76.8	70.8	77.9
PoseWarper	val18	79.9	86.3	82.4	77.5	79.8	78.8	73.2	79.7
PoseWarper	test18	78.9	84.4	80.9	76.8	75.6	77.5	71.8	78.0

Video Pose Propagation on PoseTrack17

Method	Head	Shoulder	Elbow	Wrist	Hip	Knee	Ankle	Mean
Pseudo-labeling w/HRNet	79.1	86.5	81.4	74.7	81.4	79.4	72.3	79.3
FlowNet2 Propagation	82.7	91.0	83.8	78.4	89.7	83.6	78.1	83.8
PoseWarper	86.0	92.7	89.5	86.0	91.5	89.1	86.6	88.7

Environment

The code is developed using python 3.7, pytorch-1.1.0, and CUDA 10.0.1 on Ubuntu 18.04. For our experiments, we used 8 NVIDIA P100 GPUs.

License

PoseWarper is released under the Apache 2.0 license.

Quick start

Installation

Create a conda virtual environment and activate it:

conda create -n posewarper python=3.7 -y
source activate posewarper

Install pytorch v1.1.0:

conda install pytorch=1.1.0 torchvision -c pytorch

Install mmcv:
```
pip install mmcv
```

Install COCOAPI:

# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
python setup.py install --user

Clone this repo. Let's refer to it as ${POSEWARPER_ROOT}:

git clone https://github.com/facebookresearch/PoseWarper.git

Install other dependencies:

cd ${POSEWARPER_ROOT}
pip install -r requirements.txt

Compile external modules:

cd ${POSEWARPER_ROOT}/lib
make
cd ${POSEWARPER_ROOT}/lib/deform_conv
python setup.py develop

Download our pretrained models, and some supplementary data files from this link and extract it to ${POSEWARPER_SUPP_ROOT} directory.

Data preparation

For PoseTrack17 data, we use a slightly modified version of the PoseTrack dataset where we rename the frames to follow %08d format, with first frame indexed as 1 (i.e. 00000001.jpg). First, download the data from PoseTrack download page. Then, rename the frames for each video as described above using this script.

We provide all the required JSON files, which have already been converted to COCO format. Evaluation is performed using the official PoseTrack evaluation code, poseval, which uses py-motmetrics internally. We also provide required MAT/JSON files that are required for the evaluation.

Your extracted PoseTrack17 images directory should look like this:

${POSETRACK17_IMG_DIR}
|-- bonn
`-- bonn_5sec
`-- bonn_mpii_test_5sec
`-- bonn_mpii_test_v2_5sec
`-- bonn_mpii_train_5sec
`-- bonn_mpii_train_v2_5sec
`-- mpii
`-- mpii_5sec

For PoseTrack18 data, please download the data from PoseTrack download page. Since the video frames are named properly, you only need to extract them into a directory of your choice (no need to rename the video frames). As with PoseTrack17, we provide all required JSON files for PoseTrack18 dataset as well.

Your extracted PoseTrack18 images directory should look like this:

${POSETRACK18_IMG_DIR}
|--images
`-- |-- test
    `-- train
    `-- val

PoseTrack17 Experiments

First, you will need to modify scripts/posetrack17_helper.py by setting appropriate path variables:

#### environment variables
cur_python = '/path/to/your/python/binary'
working_dir = '/path/to/PoseWarper/'

### supplementary files
root_dir = '/path/to/our/provided/supplementary/files/directory/'

### directory with extracted and renamed frames
img_dir = '/path/to/posetrack17/renamed_images/'

where working_dir=/path/to/PoseWarper/ should be the same as ${POSEWARPER_ROOT}, root_dir=/path/to/our/provided/supplementary/files/directory/ should be set to ${POSEWARPER_SUPP_ROOT}, and lastly img_dir=/path/to/posetrack17/renamed_images/ should point to ${POSETRACK17_IMG_DIR}.

After that, you can run the following PoseTrack17 experiments. All the output files, including the trained models will be saved in ${POSEWARPER_SUPP_ROOT}/posetrack17_experiments/ directory.

Video Pose Propagation

cd ${POSEWARPER_ROOT}
python scripts/posetrack17_helper.py 1

Data Augmentation with PoseWarper

cd ${POSEWARPER_ROOT}
python scripts/posetrack17_helper.py 2

Comparison to State-of-the-Art

cd ${POSEWARPER_ROOT}
python scripts/posetrack17_helper.py 3

All of the above experiments

cd ${POSEWARPER_ROOT}
python scripts/posetrack17_helper.py 0

PoseTrack18 Experiments

First, you will need to modify scripts/posetrack18_helper.py by setting appropriate path variables:

#### environment variables
cur_python = '/path/to/your/python/binary'
working_dir = '/path/to/PoseWarper/'

### supplementary files
root_dir = '/path/to/our/provided/supplementary/files/directory/'

### directory with extracted frames
img_dir = '/path/to/posetrack18/'

After that, you can run the following PoseTrack18 experiment. All the output files, including the trained models will be saved in ${POSEWARPER_SUPP_ROOT}/posetrack18_experiments/ directory.

Comparison to State-of-the-Art

cd ${POSEWARPER_ROOT}
python scripts/posetrack18_helper.py

Changing the Number of GPUs

Our experiments were conducted using 8 NVIDIA P100 GPUs. If you want to use a smaller number of GPUs, you need to modify *.yaml configuration files in experiments/posetrack/hrnet/. Specifically, you need to modify the GPUS entry in each configuration file. Depending on how many GPUs are used during training, you might also need to change TRAIN.BATCH_SIZE_PER_GPU entry in the configuration files.

In addition to using 8 GPUs, we also tried using 4 GPUs for our experiments. Using a 4 GPU setup, we obtained similar results as with 8 GPUs without changing TRAIN.BATCH_SIZE_PER_GPU. However, note that the experiments will run substantially slower when smaller number of GPUs is used.

Citation

If you use our code or models in your research, please cite our NeurIPS 2019 paper:

@inproceedings{NIPS2019_gberta,
title = {Learning Temporal Pose Estimation from Sparsely Labeled Videos},
author = {Bertasius, Gedas and Feichtenhofer, Christoph, and Tran, Du and Shi, Jianbo, and Torresani, Lorenzo},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2019},
}

Acknowledgement

Our PoseWarper implementation is built on top of Deep High Resolution Network implementation. We thank the authors for releasing their code.

jjcao/PoseWarper

Learning Temporal Pose Estimation from Sparsely Labeled Videos (NeurIPS 2019)

Introduction

Results on the PoseTrack Dataset

Temporal Pose Aggregation during Inference

Video Pose Propagation on PoseTrack17

Environment

License

Quick start

Installation

Data preparation

PoseTrack17 Experiments

Video Pose Propagation

Data Augmentation with PoseWarper

Comparison to State-of-the-Art

All of the above experiments

PoseTrack18 Experiments

Comparison to State-of-the-Art

Changing the Number of GPUs

Citation

Acknowledgement