/Gaze-Estimation_openVINO

My work for Udacity Intel® Edge AI for IoT Developers Nanodegree Program. Project Name: Computer Pointer Controller

Primary LanguagePython

Computer Pointer Controller

In this project, we will be using the Gaze Estimation model to estimate the gaze of the user's eyes and change the mouse pointer position accordingly. We will use 4 pre-trained model from openVINO:

  1. face-detection-adas-binary-0001
  2. head-pose-estimation-adas-0001
  3. facial-landmarks-35-adas-0002
  4. gaze-estimation-adas-0002

The Pipeline

The flow of data will look like this:

Project Set Up and Installation

  • Install the openVINO toolkit according to the system being used.

  • Extract the submission file or clone the repository:

    https://github.com/Oktafsurya/Gaze-Estimation_openVINO.git

  • Setup the virtual environment in your system

  • Install all the requirements

    pip install -r requirements.txt

  • Download 4 pre-trained model from openVINO

    python src/model_downloader.py

Demo

Using CPU:

python3 src/main.py --in bin/demo.mp4 --out output/result.avi

Result:

or you can refer to the video result for each model precision

  • FP16 (all model using FP16 precision)
  • FP16-INT8 (all model using FP16-INT8 precision)
  • FP32 (all model using FP32 precision)

Documentation

Command line arguments needed by model_downloader.py

Argument Type Description
--fd Required (with default value) face detection model that want to download.
--fl Required (with default value) facial landmark detection model that want to download.
--hp Required (with default value) head pose estimation model that want to download.
--ge Required (with default value) gaze estimation model that want to download.

Command line arguments needed by main.py

Argument Type Description
--face_det Required (with default value) Path to a face detection model xml file with a trained model.
--facial_land Required (with default value) Path to a facial landmark detection model xml file with a trained model.
--head_pose Required (with default value) Path to a head pose estimation model xml file with a trained model.
--gaze_model Required (with default value) Path to a gaze estimation model xml file with a trained model.
--in Required Path to image or video file or CAM.
--out Required path to output video.

Benchmarks

Benchmarking is done using a laptop with specifications:

  • Brand : ASUS
  • CPU : Intel® Core™ i7-4720HQ CPU @ 2.60GHz × 8
  • Graphics : GeForce GTX 950M/PCIe/SSE2
  • RAM : 8 GB
  • OS : Ubuntu 16.04 LTS 64-bit

CPU

Properties FP16 FP16-INT8 FP32
Total Model Loading 707.41ms 2216.518ms 690.83ms
Total Inference Time 74.1s 74.2s 74.0s
FPS 0.796fps 0.795fps 0.797fps
Loading time each model (in ms) FP16 FP16-INT8 FP32
Face detection * * *
Facial landmark 397.866 1728.518 348.82
Head pose 75.923 192.77 61.65
Gaze estimation 91.259 166.07 75.10

You can refer to the .txt file for benchmarking result for each model precision

Results

From the benchmarking result above, we can conclude that model with lower precision give us faster total inference time, total time to load all model and also fps. Model with higher precision for example FP16-INT8 tend to give slower total inference time, total time to load all model and fps.