Computer Pointer Controller
Openvino version: 2020.2.120, Python 3.7.3
In this project I used 4 pretrained models provided by Intel to build a pointer controller app. The steps that I followed are:
- Face detection to detect the face from the frame(cam or video)
- From the result of the first model I used the head posed model to estimate the directions of the head
- From the first model I took the "cropped face" to pass it on the facial landmark estimation model to detect the eye keypoints etc
- I used the gaze estimation model to estimate the directions of the eye with the necesseray inputs from the previous models
- Wrap up the results from the models to feed the mouse with the new positon coords.
This project has many potentials for future applications such us helping control the move of mouse for the people who have motion difficulties etc..
Project Set Up and Installation
Ubuntu-Linux Instructions:
1)You need to install openvino (Tested with: Openvino 2020.2.120) https://docs.openvinotoolkit.org/latest/openvino_docs_install_guides_installing_openvino_linux.html
2)Clone this repository
3)Run the openvino enviroment command Ubuntu example: source /opt/intel/openvino/bin/setupvars.sh
4)Download the 4 models via the model_downloader from the openvino https://docs.openvinotoolkit.org/latest/omz_tools_downloader_README.html#model_downloader_usage
Ubuntu Example:
./downloader.py --name face-detection-adas-binary-0001
Necessery models:
- face-detection-adas-binary-0001
- head-pose-estimation-adas-0001
- landmarks-regression-retail-0009
- gaze-estimation-adas-0002
Check the requirements file.
Demo
To run the app you need to run the main.py file
python main.py
-fd models/face-detection-adas-binary-0001/FP32-INT1/face-detection-adas-binary-0001
-fl models/landmarks-regression-retail-0009/FP32/landmarks-regression-retail-0009
-hp models/head-pose-estimation-adas-0001/FP16/head-pose-estimation-adas-0001
-ga models/gaze-estimation-adas-0002/FP16/gaze-estimation-adas-0002
-s video
-i /bin/demo.mp4
-vflag fd hp fl ga
-d CPU
Documentation
Flag documentation: Required flags to run: fd,fl,hp,ga,d,i,s,vflag
"-fd" or "--face_detection_model" = Path to an xml file with a face detection model.
"-fl" or "--facial_landmarks_model" = Path to an xml file with a facial landmarks model.
"-hp" or "--head_pose_model" = Path to an xml file with a head pose model.
"-ga" or "--gaze_model" = Path to an xml file with a gaze model.
"-i" or "--input_path" = Input path video.
"-s" or "--input_source" = Input source (video or cam)
"-e" or "--extension" = Path of your extension.
"-t" or "--threshold" = Set your prob threshold.
"-d" or "--device" = Specify your target device: ( CPU - GPU - FPGA - MYRIAD )
"-vflag" or "--visual_flag" = Specify your visual (models) for each frame:
Values: fd hp fl ga
fd = face detection, fl = facial landmarks
hp = head pose, ga = gaze
"-vsave" or "--visual_save" = Visual save option every 10 frames ('y' or 'n')
*Visual examples: check the pics inside src folder
Models:
Face Detection Model: https://docs.openvinotoolkit.org/latest/_models_intel_face_detection_adas_binary_0001_description_face_detection_adas_binary_0001.html
Head Pose Estimation Model: https://docs.openvinotoolkit.org/latest/_models_intel_head_pose_estimation_adas_0001_description_head_pose_estimation_adas_0001.html
Facial Landmarks Detection Model: https://docs.openvinotoolkit.org/latest/_models_intel_landmarks_regression_retail_0009_description_landmarks_regression_retail_0009.html
Gaze Estimation Model: https://docs.openvinotoolkit.org/latest/_models_intel_gaze_estimation_adas_0002_description_gaze_estimation_adas_0002.html
Tree:
.
├── bin
│ └── demo.mp4
├── models
│ ├── face-detection-adas-binary-0001
│ │ └── FP32-INT1
│ │ ├── face-detection-adas-binary-0001.bin
│ │ └── face-detection-adas-binary-0001.xml
│ ├── gaze-estimation-adas-0002
│ │ ├── FP16
│ │ │ ├── gaze-estimation-adas-0002.bin
│ │ │ └── gaze-estimation-adas-0002.xml
│ │ ├── FP16-INT8
│ │ │ ├── gaze-estimation-adas-0002.bin
│ │ │ └── gaze-estimation-adas-0002.xml
│ │ └── FP32
│ │ ├── gaze-estimation-adas-0002.bin
│ │ └── gaze-estimation-adas-0002.xml
│ ├── head-pose-estimation-adas-0001
│ │ ├── FP16
│ │ │ ├── head-pose-estimation-adas-0001.bin
│ │ │ └── head-pose-estimation-adas-0001.xml
│ │ ├── FP16-INT8
│ │ │ ├── head-pose-estimation-adas-0001.bin
│ │ │ └── head-pose-estimation-adas-0001.xml
│ │ └── FP32
│ │ ├── head-pose-estimation-adas-0001.bin
│ │ └── head-pose-estimation-adas-0001.xml
│ └── landmarks-regression-retail-0009
│ ├── FP16
│ │ ├── landmarks-regression-retail-0009.bin
│ │ └── landmarks-regression-retail-0009.xml
│ ├── FP16-INT8
│ │ ├── landmarks-regression-retail-0009.bin
│ │ └── landmarks-regression-retail-0009.xml
│ └── FP32
│ ├── landmarks-regression-retail-0009.bin
│ └── landmarks-regression-retail-0009.xml
├── pics
│ ├── facedetection2-fp32-.png
│ ├── facedetection-fp32.png
│ ├── face-fps.png
│ ├── face-inference.png
│ ├── gaze-FP16-INT8.png
│ ├── gaze-FP16.png
│ ├── gaze-FP32.png
│ ├── gaze-fps.png
│ ├── gaze-inference.png
│ ├── head-fps.png
│ ├── head-inference.png
│ ├── headpose-FP16.png
│ ├── landmarks-FP16-INT8.png
│ ├── landmarks-fp16.png
│ ├── landmarks-fp32.png
│ ├── landmarks-fps.png
│ └── landmarks-inference.png
├── project3-intel_excel.pdf
├── README.md
├── requirements.txt
└── src
├── 10_visual.jpg
├── 20_visual.jpg
├── 30_visual.jpg
├── 40_visual.jpg
├── 50_visual.jpg
├── base_model.py
├── facedetection_model.py
├── faciallandmarks_model.py
├── gaze_model.py
├── headpose_model.py
├── input_feeder.py
├── main.py
├── mouse_controller.py
└── __pycache__
├── base_model.cpython-37.pyc
├── facedetection_model.cpython-37.pyc
├── faciallandmarks_model.cpython-37.pyc
├── gaze_model.cpython-37.pyc
├── headpose_model.cpython-37.pyc
├── input_feeder.cpython-37.pyc
└── mouse_controller.cpython-37.pyc
19 directories, 61 files
bin folder -> the provided video
models folder -> the model that you need to have in order to run the app (check how to download them above)
requirements.txt -> the necessery libraries
pics -> some screenshots of DL-Benchmark tool from the models
src:
*_visual.jpg files -> the save pic of the visualazitation (vsave flag)
base_model.py -> the base class model
facedetection_model.py -> Face Detection model class for handling the facedetection model
faciallandmarks_model.py -> Facial landmarks estimation class for handling the landmarks estimation model
gaze_model.py -> Gaze estimation model class for handling the gaze estimation model
headpose_model.py -> Head pose model class for handling the head pose estimation model
input_feeder.py -> The class that can handle the input source (cam or video file)
mouse_controller.py -> the class that can handle via the pyautogui lib the pointer potitions
project3-intel_excel.pdf -> excel benchmark table
Benchmarks
I used the Intel's DL-Benchmark tool that is included in the openvino toolkit. (Since I don't have access on the Intel's devcloud yet, to test the app on more edge devices like VPU, XEON-CPU and FGPA devices) With this tool we can have a good first idea of what we can expect from our available devices ( FPS, Latency etc)
Results
Tested with 9600K CPU with Intel® UHD Graphics 630 - Intel's DL-Benchmark FPS / Latency-Inference time on Random Generated Dataset
Face detection
Gaze
Facial landmarks
Head Pose
Plots
The results of FPS and latency time(ms) will be different on real test such as the provided video. But we already know from the graphs-stats of DL-Benchmark tool that the CPU is better than IGPU on this project.
After becnhmarking with different precisions and my available devices (CPU and iGPU on same cases) I can clearly see the effect of the model's size ( less MB ) from the different precisions FP32, FP16, FP16-INT8. The trick on the different precisions is the computations precisions on the floating numbers etc. This is a very good technique on the CV, DL and ML fields because we can have good effecient models. There is always a trade off on the precision strategy such as the accuracy. When we are using lower precision model the results may be not as good as FP32 or FP16. In our case, FP16-INT8 is not as good as FP16 because in order to have a good balance between accuracy detection and effeciency we need to keep a better precision on the computation parts. The reason is that the first two models are the most crucial parts of this "chain". If the first model can not detect corretly the face we will have problems on the rest of the models. Also we can see that FP16-INT8 does not have good enough differences from the FP16. The only good difference is the model's size. Since we want the right balance of effeciency I will choose the FP16 models. Another observation on the benchmark is the iGPU performance. I believe that the iGPU could perform a little better with async because of the multi core computations. GPU can procces more frames per second compared to any other hardware and mainly during FP16 because GPU has multiple core and instruction sets that are specifically optimized to run 16bit floating point operations.
Edge Cases
From what I see on DL-Benchmark tool maybe the right device for this project, even if we had a FPGA device, is the CPU device. Also another important notice for this project is the results (prediction-estimation) of the first model (facedetection). Because if the first model cannot detect correctly, the whole chain will be down. So if we want to use this app with the idea of edge app, we need to think ways to improve this idea. One example is to crop the frame even smaller to "aim" the face to prevent the case of multiple faces on the crowd, because after the first face that will detect, the chain will continue and the results will not be good if the first detected face has not a good potition in front of the camera.