A new method to fuse high-level features and low-level features in human pose estimation
This code refers to SimpleBaseline: https://github.com/microsoft/human-pose-estimation.pytorch. we use Semantic Embedding Block (SEB) and Global Convolutional Network (GCN) blocks to bridge the gap between low-level and high-level features. Experiments on MPII and LSP human pose estimation datasets demonstrate that efficient feature fusion can significantly improve the performance.
Method | Input | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean |
---|---|---|---|---|---|---|---|---|---|
SimpleBaseline_ResNet50 | 256x256 | 96.35 | 95.33 | 88.99 | 83.18 | 88.42 | 83.96 | 79.59 | 88.53 |
ours | 256x256 | 96.73 | 95.35 | 89.50 | 83.73 | 88.23 | 84.43 | 79.92 | 88.82 |
SimpleBaseline_ResNet50 | 384x384 | 96.66 | 95.75 | 89.79 | 84.61 | 88.52 | 84.67 | 79.29 | 89.07 |
ours | 384x384 | 96.67 | 95.75 | 90.05 | 85.58 | 88.85 | 84.73 | 79.74 | 89.35 |
python >= 3.6
pytorch >= 1.0.0
- Download the dataset and pretrained model, you can follow the an official pytorch implementation of SimpleBaseline.
- Training the model:
python pose_estimation/train.py \
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml
- valid the model:
python pose_estimation/valid.py \
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
--flip-test \
--model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar
look forward to multi-scale feature fusion structures.