/EfficientPose

Scalable single-person pose estimation

Primary LanguageHTMLApache License 2.0Apache-2.0

EfficientPose

Publicly accessible scalable single-person pose estimation as introduced in "EfficientPose: Scalable single-person pose estimation". We provide a simple intuitive interface for high-precision movement extraction from 2D images, videos, or directly from your webcamera.

NOTE: All data remains safely at your computer during use.

Live demo

1. Plug

Assuming you have Python (>= 3.7) and FFMPEG (>= 4.4) preinstalled, simply run:

pip install -r requirements.txt

2. Play

Say the magical two words:

python track.py

Explore

Did I forget to mention flexibility? Indeed there is!

You are provided with these options (which go seamlessly hand in hand):

  • Path (--path, -p): Tell the program which file (i.e., video or image) you want to analyze. Ignore this option for camera-based tracking. For ex: python track.py --path=utils/MPII.jpg

  • Model (--model, -m): Explore choice of model (EfficientPose RT - IV or EfficientPose RT Lite - II Lite) depending on your computational resources and precision requirements. For more details, we refer to the performance comparison. For ex: python track.py --model=II_Lite

  • Framework (--framework, -f): Have specific preference of deep learning framework? We provide models in Keras, TensorFlow, TFLite and PyTorch. In general, TensorFlow is recommended for maximal precision with low computational overhead on GPU, while TFLite (and PyTorch in case of ARM CPUs with QNNPACK) supports use in resource-constrained applications, such as smartphones. For ex: python track.py --framework=tensorflow

  • Visualize predictions (--visualize, -v): Visualizes the keypoint predictions on top of the image/video you provided and stores the file in the folder of the original file. For ex: python track.py --path=utils/MPII.jpg --visualize

  • Save predictions (--store, -s): Stores the predicted coordinates of 16 keypoints (top of head, upper neck, shoulders, elbows, wrists, thorax, pelvis, hips, knees, and ankles) from image/video/camera as a CSV file. Run: python track.py --store

Evidence

Model Resolution Parameters FLOPs PCKh@50 (MPII val) PCKh@10 (MPII val) PCKh@50 (MPII test) PCKh@10 (MPII test)
EfficientPose RT Lite* 224x224 0.40M 0.86G 80.6 23.1 - -
EfficientPose RT 224x224 0.46M 0.87G 82.9 23.6 84.8 24.2
EfficientPose I Lite* 256x256 0.59M 1.54G 83.7 27.7 - -
EfficientPose I 256x256 0.72M 1.67G 85.2 26.5 - -
EfficientPose II Lite* 368x368 1.46M 7.25G 87.1 30.8 - -
EfficientPose II 368x368 1.73M 7.70G 88.2 30.2 - -
EfficientPose III 480x480 3.23M 23.35G 89.5 30.9 - -
EfficientPose IV 600x600 6.56M 72.89G 89.8 35.6 91.2 36.0
OpenPose (Cao et al.) 368x368 25.94M 160.36G 87.6 22.8 88.8 22.5

*EfficientPose Lite models modify the original EfficientPose architecture to target deployment on edge devices through low latency inference. In particular, the EfficientNet backbones are replaced by EfficientNet Lite models of similar scaling, parallel feature extraction and cross-resolution features are omitted, squeeze-and-excitation modules are removed, and E-swish activations are replaced by ReLU6.

All models were trained with similar optimization procedure and the precision was evaluated on the single-person MPII benchmark in terms of PCKh@50 and PCKh@10. Due to restriction in number of attempts on MPII test, only EfficientPose RT and IV, and the baseline method OpenPose were officially evaluated.

PWC

Guidelines

To achieve the optimal precision provided by the software, please adhere to the following three principles:

  1. Ensure there is only one person present in the image/video
  2. Ensure that the full body of the person is clearly visible, is near the centre of the image/video frame, and is occupying a sufficient portion of the camera view (portrait mode is recommended when feasible)
  3. Avoid that the subject is occluded by other objects, even partial occlusion is discouraged

Acknowledgment

The work is conducted as a collaboration between the Department of Neuromedicine and Movement Science and the Department of Computer Science at the Norwegian University of Science and Technology. State-of-the-art computational infrastructure is provided by the Norwegian Open AI Lab. We are also greatful to TensorFlow and Data Scientist Pavel Yakubovskiy for making EfficientNet and EfficientNet Lite models more widely available; this provided a great starting point for our research.

Citation

If you enjoyed this project or found the work helpful in your research, please cite the following:

@article{groos2021efficientpose,
  title={EfficientPose: Scalable single-person pose estimation},
  author={Groos, Daniel and Ramampiaro, Heri and Ihlen, Espen AF},
  journal={Applied Intelligence},
  volume={51},
  number={4},
  pages={2518--2533},
  year={2021},
  publisher={Springer}
}

Main updates

  • April 2021: Provided support for TensorFlow 2
  • November 2020: Launched EfficientPose Lite models in Keras, TensorFlow and TFLite
  • April 2020: Launched EfficientPose models in Keras, TensorFlow, TFLite and PyTorch

Relevant resources