In this project, you will use a Gaze Detection Model to control the mouse pointer of your computer. You will be using the Gaze Estimation model to estimate the gaze of the user's eyes and change the mouse pointer position accordingly. This project will demonstrate your ability to run multiple models in the same machine and coordinate the flow of data between those models.
You will be using the InferenceEngine API from Intel's OpenVino ToolKit to build the project. The gaze estimation model requires three inputs:
- The head pose
- The left eye image
- The right eye image.
To get these inputs, you will have to use three other OpenVino models:
You will have to coordinate the flow of data from the input, and then amongst the different models and finally to the mouse controller. The flow of data will look like this:
Step1. Download OpenVino Toolkit 2020.1 with all the prerequisites by following this installation guide
Step2. Clone the Repository using git clone https://github.com/bhadreshpsavani/Computer-Pointer-Controller.git
Step3. Create Virtual Environment using command virtualenv venv
in the command prompt
Step4. install all the dependency using pip install requirements.txt
.
Step5: Download model from OpenVino Zoo
using below four commands, This commands will download four required model with all the precisions available in the default output location.
python3 /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name gaze-estimation-adas-0002
python3 /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name face-detection-adas-binary-0001
python3 /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name head-pose-estimation-adas-0001
python3 /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name landmarks-regression-retail-0009
Step1. Open command prompt Activate Virtual Environment
cd venv/Scripts/
activate
Step2. Instantiate OpenVino Environment. For windows use below command
cd C:\Program Files (x86)\IntelSWTools\openvino\bin\
setupvars.bat
Step3. Go back to the project directory src
folder
cd path_of_project_directory
cd src
Step4. Run below commands to execute the project
python main.py -fd ../intel/face-detection-adas-binary-0001/FP32-INT1/face-detection-adas-binary-0001.xml \
-lr ../intel/landmarks-regression-retail-0009/FP32-INT8/landmarks-regression-retail-0009.xml \
-hp ../intel/head-pose-estimation-adas-0001/FP32-INT8/head-pose-estimation-adas-0001.xml \
-ge ../intel/gaze-estimation-adas-0002/FP32-INT8/gaze-estimation-adas-0002.xml \
-i ../bin/demo.mp4 -flags ff fl fh fg
Command Line Argument Information:
- fd : Specify path of xml file of face detection model
- lr : Specify path of xml file of landmark regression model
- hp : Specify path of xml file of Head Pose Estimation model
- ge : Specify path of xml file of Gaze Estimation model
- i : Specify path of input Video file or cam for Webcam
- flags (Optional): if you want to see preview video in separate window you need to Specify flag from ff, fl, fh, fg like -flags ff fl...(Space seperated if multiple values) ff for faceDetectionModel, fl for landmarkRegressionModel, fh for headPoseEstimationModel, fg for gazeEstimationModel
- probs (Optional): if you want to specify confidence threshold for face detection, you can specify the value here in range(0, 1), default=0.6
- d (Optional): Specify Device for inference, the device can be CPU, GPU, FPGU, MYRID
- o : Specify path of output folder where we will store results
intel: This folder contains models in IR format downloaded from Openvino Model Zoo
src: This folder contains model files, pipeline file(main.py) and utilities
model.py
is the model class file which has common property of all the other model files. It is inherited by all the other model files This folder has 4 model class files, This class files has methods to load model and perform inference.face_detection_model.py
gaze_estimation_model.py
landmark_detection_model.py
head_pose_estimation_model.py
main.py
file used to run complete pipeline of project. It calls has object of all the other class files in the foldermouse_controller.py
is utility to move mouse curser based on mouse coordinates received fromgaze_estimation_model
class predict method.input_feeder.py
is utility to load local video or webcam feedbanchmark.ipynb
,computer_controller_job.sh
, andPrecisionComparsion.ipynb
are for banchmarking result generation for different hardware and model comparison
bin: this folder has demo.mp4
file which can be used to test model
I have checked Inference Time
, Model Loading Time
, and Frames Per Second
model for FP16
, FP32
, and FP32-INT8
of all the models except Face Detection Model
. Face Detection Model
was only available on FP32-INT1
precision.
You can use below commands to get results for respective precisions,
FP16
:
python main.py -fd ../intel/face-detection-adas-binary-0001/FP32-INT1/face-detection-adas-binary-0001.xml \
-lr ../intel/landmarks-regression-retail-0009/FP16/landmarks-regression-retail-0009.xml \
-hp ../intel/head-pose-estimation-adas-0001/FP16/head-pose-estimation-adas-0001.xml \
-ge ../intel/gaze-estimation-adas-0002/FP16/gaze-estimation-adas-0002.xml \
-d CPU -i ../bin/demo.mp4 -o results/FP16/ -flags ff fl fh fg
FP32
:
python main.py -fd ../intel/face-detection-adas-binary-0001/FP32-INT1/face-detection-adas-binary-0001.xml \
-lr ../intel/landmarks-regression-retail-0009/FP32/landmarks-regression-retail-0009.xml \
-hp ../intel/head-pose-estimation-adas-0001/FP32/head-pose-estimation-adas-0001.xml \
-ge ../intel/gaze-estimation-adas-0002/FP32/gaze-estimation-adas-0002.xml \
-d CPU -i ../bin/demo.mp4 -o results/FP32/ -flags ff fl fh fg
FP32-INT8
:
python main.py -fd ../intel/face-detection-adas-binary-0001/FP32-INT1/face-detection-adas-binary-0001.xml \
-lr ../intel/landmarks-regression-retail-0009/FP32-INT8/landmarks-regression-retail-0009.xml \
-hp ../intel/head-pose-estimation-adas-0001/FP32-INT8/head-pose-estimation-adas-0001.xml \
-ge ../intel/gaze-estimation-adas-0002/FP32-INT8/gaze-estimation-adas-0002.xml \
-d CPU -i ../bin/demo.mp4 -o results/FP32-INT8/ -flags ff fl fh fg
precisions = ['FP16', 'FP32', 'FP32-INT8']
Inference Time : [26.6, 26.4, 26.9]
fps : [2.218045112781955, 2.234848484848485, 2.193308550185874]
Model Load Time : [1.6771371364593506, 1.6517729759216309, 5.205628395080566]
Asynchronous Inference
precisions = ['FP16', 'FP32', 'FP32-INT8']
Inference Time : [23.9, 24.7, 24.0]
fps : [2.468619246861925, 2.388663967611336, 2.4583333333333335]
Model Load Time : [0.7770581245422363, 0.7230548858642578, 2.766681432723999]
- From above observations we can say that
FP16
has lowest model time andFP32-INT8
has highest model loading time, the reason for the higher loading time can be said as combination of precisions lead to higher weight of the model forFP32-INT8
. - For
Inference Time
andFPS
,FP32
give slightly better results. There is not much difference for this three different models for this two parameters. - I have tested model for Asynchronous Inference and Synchronous Inference, Asynchronous Inference has better results it has slight improvement in
inference time
andFPS
- Multiple People Scenario: If we encounter multiple people in the video frame, it will always use and give results one face even though multiple people detected,
- No Head Detection: it will skip the frame and inform the user
- lighting condition: We might use HSV based pre-processing steps to minimize error due to different lighting conditions