/face-mask-detection

Face Mask Detection using NVIDIA Transfer Learning Toolkit (TLT) and DeepStream for COVID-19

Primary LanguagePythonMIT LicenseMIT

------------------------------------------------------

This sample application is no longer maintained

------------------------------------------------------

face_mask_detection

NVIDIA Developer Blog

The project shows, tutorial for NVIDIA's Transfer Learning Toolkit (TLT) + DeepStream (DS) SDK ie training and inference flow for detecting faces with mask and without mask on Jetson Platform.

By the end of this project; you will be able to build DeepStream app on Jetson platform to detect faces with mask and without mask.

alt text

What this project includes

  • Transfer Learning Toolkit (TLT) scripts:
    • Dataset processing script to convert it in KITTI format
    • Specification files for configuring tlt-train, tlt-prune, tlt-evalute
  • DeepStream (DS) scripts:
    • deepstream-app config files (For demo on single stream camera and detection on stored video file)

What this project does not provide

  • Trained model for face-mask detection; we will go through step by step to produce detetctnet_v2 (with ResNet18 backbone) model for face-mask detection.
  • NVIDIA specific dataset for faces with and without mask; we suggest following dataset based on our experiments.

Preferred Datasets

Note: We do not use all the images from MAFA and WiderFace. Combining we will use about 6000 faces each with and without mask

Steps to perform Face Detection with Mask:

  • Install dependencies and Docker Container

    • On Training Machine with NVIDIA GPU:
      • Install NVIDIA Docker Container: installation instructions TLT Toolkit Requirements
      • Running Transfer Learning Toolkit using Docker
        • Pull docker container:
          docker pull nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3
        • Run the docker image:
          docker run --gpus all -it -v "/path/to/dir/on/host":"/path/to/dir/in/docker" \
                        -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 /bin/bash
          
      • Clone Git repo in TLT container:
        git clone https://github.com/NVIDIA-AI-IOT/face-mask-detection.git
        
      • Install data conversion dependencies
        cd face-mask-detection
        python3 -m pip install -r requirements.txt
        
    • On NVIDIA Jetson:
  • Prepare input data set (On training machine)

    • We expect downloaded data in this structure.

    • Convert data set to KITTI format cd face-mask-detection

      python3 data2kitti.py --kaggle-dataset-path <kaggle dataset absolute directory path> \
                               --mafa-dataset-path <mafa dataset absolute  directory path> \
                               --fddb-dataset-path < FDDB dataset absolute  directory path> \
                               --widerface-dataset-path <widerface dataset absolute  directory path> \
                               --kitti-base-path < Out directory for storing KITTI formatted annotations > \
                               --category-limit < Category Limit for Masked and No-Mask Faces > \
                               --tlt-input-dims_width < tlt input width > \
                               --tlt-input-dims_height <tlt input height > \
                               --train < for generating training dataset >
      

      You will see following output log:

        Kaggle Dataset: Total Mask faces: 4154 and No-Mask faces:790
        Total Mask Labelled:4154 and No-Mask Labelled:790
      
        MAFA Dataset: Total Mask faces: 1846 and No-Mask faces:232
        Total Mask Labelled:6000 and No-Mask Labelled:1022
      
        FDDB Dataset: Mask Labelled:0 and No-Mask Labelled:2845
        Total Mask Labelled:6000 and No-Mask Labelled:3867
      
        WideFace: Total Mask Labelled:0 and No-Mask Labelled:2134
        ----------------------------
        Final: Total Mask Labelled:6000
        Total No-Mask Labelled:6001
        ----------------------------
      

    Note: You might get warnings; you can safely ignore it

  • Perform training using TLT training flow

  • Perform inference using DeepStream SDK on Jetson

    • Transfer model files (.etlt), if int8: calibration file (calibration.bin)
    • Use config files from /ds_configs/* $vi config_infer_primary_masknet.txt
      • Modify model and label paths: according to your directory locations
        • Look for tlt-encoded-model, labelfile-path, model-engine-file, int8-calib-file
      • Modify confidence_threshold, class-attributes according to training
        • Look for classifier-threshold, class-attrs
    • Use deepstream_config files: $ vi deepstream_app_source1_masknet.txt
      • Modify model file and config file paths:
        • Look for model-engine-file, config-file under primary-gie
    • Use deepstream-app to deploy in real-time $deepstream-app -c deepstream_app_source1_video_masknet_gpu.txt
    • We provide two different config files:
      • DS running on GPU only with camera input: deepstream_app_source1__camera_masknet_gpu.txt
      • DS running on GPU only with saved video input: deepstream_app_source1_video_masknet_gpu.txt

Note:
- model-engine-file is generated at first run; once done you can locate it in same directory as .etlt - In case you want to generate model-engine-file before first run; use tlt-converter

Evaluation Results on NVIDIA Jetson Platform

Pruned mAP (Mask/No-Mask)
(%)
Inference Evaluations on Nano Inference Evaluations on Xavier NX Inference Evaluations on Xavier
GPU
(FPS)
GPU
(FPS)
DLA
(FPS)
GPU
(FPS)
DLA
(FPS)
No 86.12 (87.59, 84.65) 6.5 125.36 30.31 269.04 61.96
Yes (12%**) 85.50 (86.72, 84.27) 21.25 279 116.2 508.32 155.5

NVIDIA Transfer Learning Toolkit (TLT) Training Flow

  1. Download Pre-trained model ( For Mask Detection application, we have experimented with Detectnet_v2 with ResNet18 backbone)
  2. Convert dataset to KITTI format
  3. Train Model (tlt-train)
  4. Evaluate on validation data or infer on test images (tlt-evaluate, tlt-infer)
  5. Prune trained model (tlt-prune)
    Pruning model will help you to reduce parameter count thus improving FPS performance
  6. Retrain pruned model (tlt-train)
  7. Evaluate re-trained model on validation data (tlt-evaluate)
  8. If accuracy does not fall below satisfactory range in (7); perform step (5), (6), (7); else go to step (9)
  9. Export trained model from step (6) (tlt-export)
    Choose int8, fp16 based on you platform needs; such as Jetson Xavier and Jetson Xavier-NX has int8 DLA support

Interesting Resources

References

  • Evan Danilovich (2020 March). Medical Masks Dataset. Version 1. Retrieved May 14, 2020 from https://www.kaggle.com/ivandanilovich/medical-masks-dataset
  • Shiming Ge, Jia Li, Qiting Ye, Zhao Luo; "Detecting Masked Faces in the Wild With LLE-CNNs", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2682-2690
  • Vidit Jain and Erik Learned-Miller. "FDDB: A Benchmark for Face Detection in Unconstrained Settings". Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst. 2010
  • Yang, Shuo and Luo, Ping and Loy, Chen Change and Tang, Xiaoou; "WIDER FACE: A Face Detection Benchmark", IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016
  • MAFA Dataset Google Link: Courtesy aome510