/part-affinity

Multi person pose implementation using part affinity fields

Primary LanguagePythonMIT LicenseMIT

Part Affinity Field Implementation in PyTorch

Pure python and pytorch implementation of Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. Original caffe implementation is here

COCO Multi-person Dataset and Dataloader Setup:

Download train2017.zip, val2017.zip and annotations_trainval2017.zip from COCO Project The keypoints description can be found here. Extract the folders and place them in '/data'. Pre-processing of the dataset is done on the fly. To visualize the data loader use:

python visualize_coco_dataloader.py -data ../data -vizPaf

The data loader depends on pycocoapi which can be installed using

pip install pycocotools

Design choices at this stage are i) Width of part affinity field ii) Heatmap std. iii) Choosing the parts for PAF iv) PAF magnitude (smooth/rigid) v) Masking crowded/unannotated joints? Due to differences in scale of the persons across dataset, some of these choices play an important role during training. Original paper uses constant PAF with a single part width for all joints across dataset. But this can introduce a lot of noise to the data in terms of misleading ground truth pafs/heatmaps. Alternate design choices are exposed in this implementation while keeping the original choices as default.

NN Model:

The paper uses first 10 layers from VGG-19 as feature extractor followed by 7 heatmap/paf regression stages with intermediate supervision at each stage. The same is implemented here.

Training and Testing:

python main.py -data ../data -expID vgg19 -model vgg -train

Comprehensive list of opts can be found in opts/ folder. To debug/visualize each image's outputs during training -vizOut flag is helpful. 50k iterations takes around 11.5 hours with a batch size of 8 on a GTX 1080 GPU

Sample nose heatmap outputs and nose-eye paf ouput is below after 20 epochs of training: Sample Output

Evaluation:

Evaluation is performed at multple scales and the average heatmap and paf are used for decoding pose. The evaluation pipeline is used from here.

python eval.py -data ../data -expID vgg19 -loadModel ../exp/vgg19/model_20.pth

Sample Output