A High-Performance Pytorch Implementation of the paper "DSFD: Dual Shot Face Detector" (CVPR 2019)..
NOTE This implementation is based on the original source code released by the authors of the paper. This repo can only be used for inference of their model based on ResNet-152 and all training scripts are removed. If you want to finetune the model, we recommend you to use the original source code.
- Python >= 3.6
- Pytorch >= 1.1
- Torchvision >= 0.3.0
- OpenCV
If you are familiar with Docker, we recommend you to use our custom Dockerfile to set up your environment.
- Removal of all unnecessary files for training / loading VGG models.
- Improve the inference time by about 30x (from ~6s to 0.2) with rough estimates using
time
(Measured on a V100-32GB GPU).
The main improvements in inference time comes from:
- Replacing non-maximum-suppression to a highly optimized torchvision version
- Refactoring
init_priors
in the SSD model to cache previous prior sizes (no need to generate this per forward pass). - Refactoring the forward pass in
Detect
inutils.py
to perform confidence thresholding before non-maximum suppression - Minor changes in the forward pass to use pytorch 1.0 features
- Download the original weights file from the original source code repo. Place this into the path
dsfd/weights/
- Run
python3 test.py
This will look for images in the images/
folder, and save the results in the same folder with an ending _out.jpg
To perform detection you can simple use the following lines:
import cv2
from dsfd.detect import DSFDDetector
weight_path = "dsfd/weights/WIDERFace_DSFD_RES152.pth"
im = cv2.imread("path_to_im.jpg")
detector = DSFDDetector(weight_path)
detections = detector.detect_face(im, confidence_threshold=.5, shrink=1.0)
This will return a tensor with shape [N, 5]
, where N is number of faces and the five elements are [xmin, ymin, xmax, ymax, detection_confidence]
This is very roughly estimated on a 1024x687 image. The reported time is the average over 100 runs. (With no cudnn benchmarking and no fp16 computation).
Device | MS |
---|---|
CPU (MacOS Mid '14 15-Inch, Intel 2.2GHz i7) | 17,496 |
GPU (1x NVIDIA V100-32GB) | 100 |
For their results on the WIDER-Face dataset, they used detection over several image scales. This is replicated in the function get_face_detections
.
use_multiscale_detect
and use_image_pyramid_detect
are very slow. In most cases, having all to False
works well and is the fastest method. However, if you want to reproduce any of their results in the paper, we recommend you to turn all to True
NOTE: We have (yet) not tried to replicate their results on any of the datasets presented in the paper.
If you find this code useful, remember to cite the original authors:
@inproceedings{li2018dsfd,
title={DSFD: Dual Shot Face Detector},
author={Li, Jian and Wang, Yabiao and Wang, Changan and Tai, Ying and Qian, Jianjun and Yang, Jian and Wang, Chengjie and Li, Jilin and Huang, Feiyue},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2019}
}