/DMHead

Dual model head pose estimation. Fusion of SOTA models. 360° 6D HeadPose detection. All pre-processing and post-processing are fused together, allowing end-to-end processing in a single inference.

Primary LanguagePythonMIT LicenseMIT

DMHead

Dual model head pose estimation. Fusion of SOTA models. 360° 6D HeadPose detection. All pre-processing and post-processing are fused together, allowing end-to-end processing in a single inference.

1. Summary

icon_design drawio (14)

  • [Front side] Wearing a mask mode - 6DRepNet (RepVGG-B1g2)

    • Paper

      image

    • Fine tune (My own training)

      Yaw: 3.3193, Pitch: 4.9063, Roll: 3.3687, MAE: 3.8648
      
  • [Front side] Not wearing a mask mode - SynergyNet (MobileNetV2)

    • Paper

      image

  • [Rear side] WHENet

    • Paper

      image

2. Inference Test

wget https://github.com/PINTO0309/DMHead/releases/download/1.1.2/yolov4_headdetection_480x640_post.onnx
wget https://github.com/PINTO0309/DMHead/releases/download/1.1.2/dmhead_mask_Nx3x224x224.onnx
wget https://github.com/PINTO0309/DMHead/releases/download/1.1.2/dmhead_nomask_Nx3x224x224.onnx

python demo_video.py
python demo_video.py \
[-h] \
[--device DEVICE] \
[--height_width HEIGHT_WIDTH] \
[--mask_or_nomask {mask,nomask}]

optional arguments:
  -h, --help
    Show this help message and exit.

  --device DEVICE
    Path of the mp4 file or device number of the USB camera.
    Default: 0

  --height_width HEIGHT_WIDTH
    {H}x{W}.
    Default: 480x640

  --mask_or_nomask {mask,nomask}
    Select either a model that provides high accuracy when wearing a mask or
    a model that provides high accuracy when not wearing a mask.
    Default: mask

3. Atmosphere

  • August 15, 2022 - MAE: 3.8648

    output_.mp4
    output_.mp4

4. Benchmark

  • 6DRepNet
  • Official Paper FineTuned
    Yaw: 3.6266, Pitch: 4.9066, Roll: 3.3734, MAE: 3.9688
    
  • Trained on 300W-LP (Custom, Mask-wearing face image augmentation)
  • Test on AFLW2000
    • June 20, 2022
      Yaw: 3.6129, Pitch: 5.5801, Roll: 3.8468, MAE: 4.3466
      
    • July 3, 2022 _epoch_321.pth
      Yaw: 3.3346, Pitch: 5.0004, Roll: 3.5381, MAE: 3.9577
      
    • August 15, 2022
      Yaw: 3.3193, Pitch: 4.9063, Roll: 3.3687, MAE: 3.8648
      

5. Model Structure

  • INPUTS: Float32 [N,3,224,224]
  • OUTPUTS: Float32 [N,3], [Yaw,Roll,Pitch]
Click to expand

pinheadpose_1x3x224x224 onnx

6. References

  1. https://github.com/choyingw/SynergyNet
  2. https://github.com/thohemp/6DRepNet
  3. https://github.com/Ascend-Research/HeadPoseEstimation-WHENet
  4. https://github.com/PINTO0309/Face_Mask_Augmentation
  5. https://github.com/PINTO0309/PINTO_model_zoo/tree/main/383_DirectMHP/post_process_gen_tools
  6. https://github.com/PINTO0309/PINTO_model_zoo/tree/main/383_DirectMHP

7. Citation

@misc{https://doi.org/10.48550/arxiv.2005.10353,
    doi = {10.48550/ARXIV.2005.10353},
    url = {https://arxiv.org/abs/2005.10353},
    author = {Zhou, Yijun and Gregson, James},
    title = {WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose},
    publisher = {arXiv},
    year = {2020},
}
@misc{hempel20226d,
    title={6D Rotation Representation For Unconstrained Head Pose Estimation},
    author={Thorsten Hempel and Ahmed A. Abdelrahman and Ayoub Al-Hamadi},
    year={2022},
    eprint={2202.12555},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
@INPROCEEDINGS{wu2021synergy,
  author={Wu, Cho-Ying and Xu, Qiangeng and Neumann, Ulrich},
  booktitle={2021 International Conference on 3D Vision (3DV)},
  title={Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry},
  year={2021}
}