The project shows, tutorial for NVIDIA's Transfer Learning Toolkit (TLT) + DeepStream (DS) SDK ie training and inference flow for detecting faces with mask and without mask on Jetson Platform.
By the end of this project; you will be able to build DeepStream app on Jetson platform to detect faces with mask and without mask.
- Transfer Learning Toolkit (TLT) scripts:
- Dataset processing script to convert it in KITTI format
- Specification files for configuring tlt-train, tlt-prune, tlt-evalute
- DeepStream (DS) scripts:
- deepstream-app config files (For demo on single stream camera and detection on stored video file)
- Trained model for face-mask detection; we will go through step by step to produce detetctnet_v2 (with ResNet18 backbone) model for face-mask detection.
- NVIDIA specific dataset for faces with and without mask; we suggest following dataset based on our experiments.
- Faces with Mask
- Kaggle Medical Mask Dataset Download Link
- MAFA - MAsked FAces Download Link
- Faces without Mask
- FDDB Dataset Download Link
- WiderFace Dataset Download Link
Note: We do not use all the images from MAFA and WiderFace. Combining we will use about 6000 faces each with and without mask
-
Install dependencies and Docker Container
- On Training Machine with NVIDIA GPU:
- Install NVIDIA Docker Container: installation instructions TLT Toolkit Requirements
- Running Transfer Learning Toolkit using Docker
- Pull docker container:
docker pull nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3
- Run the docker image:
docker run --gpus all -it -v "/path/to/dir/on/host":"/path/to/dir/in/docker" \ -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 /bin/bash
- Pull docker container:
- Clone Git repo in TLT container:
git clone https://github.com/NVIDIA-AI-IOT/face-mask-detection.git
- Install data conversion dependencies
cd face-mask-detection python3 -m pip install -r requirements.txt
- Install NVIDIA Docker Container: installation instructions TLT Toolkit Requirements
- On NVIDIA Jetson:
- On Training Machine with NVIDIA GPU:
-
Prepare input data set (On training machine)
-
We expect downloaded data in this structure.
-
Convert data set to KITTI format
cd face-mask-detection
python3 data2kitti.py --kaggle-dataset-path <kaggle dataset absolute directory path> \ --mafa-dataset-path <mafa dataset absolute directory path> \ --fddb-dataset-path < FDDB dataset absolute directory path> \ --widerface-dataset-path <widerface dataset absolute directory path> \ --kitti-base-path < Out directory for storing KITTI formatted annotations > \ --category-limit < Category Limit for Masked and No-Mask Faces > \ --tlt-input-dims_width < tlt input width > \ --tlt-input-dims_height <tlt input height > \ --train < for generating training dataset >
You will see following output log:
Kaggle Dataset: Total Mask faces: 4154 and No-Mask faces:790 Total Mask Labelled:4154 and No-Mask Labelled:790 MAFA Dataset: Total Mask faces: 1846 and No-Mask faces:232 Total Mask Labelled:6000 and No-Mask Labelled:1022 FDDB Dataset: Mask Labelled:0 and No-Mask Labelled:2845 Total Mask Labelled:6000 and No-Mask Labelled:3867 WideFace: Total Mask Labelled:0 and No-Mask Labelled:2134 ---------------------------- Final: Total Mask Labelled:6000 Total No-Mask Labelled:6001 ----------------------------
Note: You might get warnings; you can safely ignore it
-
-
Perform training using TLT training flow
- Use 'face-mask-detection' Jupyter Notebook provided with this repository.
- Follow TLT training flow
-
Perform inference using DeepStream SDK on Jetson
- Transfer model files (.etlt), if int8: calibration file (calibration.bin)
- Use config files from
/ds_configs/*
$vi config_infer_primary_masknet.txt
- Modify model and label paths: according to your directory locations
- Look for
tlt-encoded-model, labelfile-path, model-engine-file, int8-calib-file
- Look for
- Modify confidence_threshold, class-attributes according to training
- Look for
classifier-threshold, class-attrs
- Look for
- Modify model and label paths: according to your directory locations
- Use
deepstream_config
files:$ vi deepstream_app_source1_masknet.txt
- Modify model file and config file paths:
- Look for
model-engine-file, config-file
underprimary-gie
- Look for
- Modify model file and config file paths:
- Use deepstream-app to deploy in real-time
$deepstream-app -c deepstream_app_source1_video_masknet_gpu.txt
- We provide two different config files:
- DS running on GPU only with camera input:
deepstream_app_source1__camera_masknet_gpu.txt
- DS running on GPU only with saved video input:
deepstream_app_source1_video_masknet_gpu.txt
- DS running on GPU only with camera input:
Note:
- model-engine-file
is generated at first run; once done you can locate it in same directory as .etlt
- In case you want to generate model-engine-file
before first run; use tlt-converter
Pruned | mAP (Mask/No-Mask) (%) |
Inference Evaluations on Nano | Inference Evaluations on Xavier NX | Inference Evaluations on Xavier | ||
GPU (FPS) |
GPU (FPS) |
DLA (FPS) |
GPU (FPS) |
DLA (FPS) |
||
No | 86.12 (87.59, 84.65) | 6.5 | 125.36 | 30.31 | 269.04 | 61.96 |
Yes (12%**) | 85.50 (86.72, 84.27) | 21.25 | 279 | 116.2 | 508.32 | 155.5 |
- Download Pre-trained model ( For Mask Detection application, we have experimented with Detectnet_v2 with ResNet18 backbone)
- Convert dataset to KITTI format
- Train Model (tlt-train)
- Evaluate on validation data or infer on test images (tlt-evaluate, tlt-infer)
- Prune trained model (tlt-prune)
Pruning model will help you to reduce parameter count thus improving FPS performance - Retrain pruned model (tlt-train)
- Evaluate re-trained model on validation data (tlt-evaluate)
- If accuracy does not fall below satisfactory range in (7); perform step (5), (6), (7); else go to step (9)
- Export trained model from step (6) (tlt-export)
Choose int8, fp16 based on you platform needs; such as Jetson Xavier and Jetson Xavier-NX has int8 DLA support
- Transfer Learning Toolkit (TLT) Getting Started
- Pruning Models with NVIDIA Transfer Learning Toolkit
- Evan Danilovich (2020 March). Medical Masks Dataset. Version 1. Retrieved May 14, 2020 from https://www.kaggle.com/ivandanilovich/medical-masks-dataset
- Shiming Ge, Jia Li, Qiting Ye, Zhao Luo; "Detecting Masked Faces in the Wild With LLE-CNNs", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2682-2690
- Vidit Jain and Erik Learned-Miller. "FDDB: A Benchmark for Face Detection in Unconstrained Settings". Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst. 2010
- Yang, Shuo and Luo, Ping and Loy, Chen Change and Tang, Xiaoou; "WIDER FACE: A Face Detection Benchmark", IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016
- MAFA Dataset Google Link: Courtesy aome510