SCRFD face detection TensorRT

This is an implementation of the SCRFD face detection with NVIDIA TensorRT C++ API.

This repo is based on the InsightFace and TensorRTX.

Export scrfd onnx

  1. Clone the repository from
  2. In file detection/scrfd/mmdet/models/dense_heads/, replace these lines
batch_size = cls_score.shape[0]
cls_score = cls_score.permute(0, 2, 3, 1).reshape(batch_size, -1, self.cls_out_channels).sigmoid()
bbox_pred = bbox_pred.permute(0, 2, 3, 1).reshape(batch_size, -1, 4)
kps_pred = kps_pred.permute(0, 2, 3, 1).reshape(batch_size, -1, 10)


cls_score = cls_score.sigmoid()
  1. Download ckpt of desired models from
  2. Generate onnx file using, for example
python detection/scrfd/tools/ detection/scrfd/configs/scrfd/ <path_to_ckpt>

After this step, check the generated onnx file scrfd_2.5g_bnkps_shape640x640.onnx in detection/scrfd/onnx/

Create the trt engine and run inference

  1. Clone the scrfd_tensorRT repository
  2. Copy scrfd_2.5g_bnkps_shape640x640.onnx into models folder
  3. Generate the engine
alias trtexec="/usr/src/tensorrt/bin/trtexec"
trtexec --onnx=scrfd_2.5g_bnkps_shape640x640.onnx \
        --saveEngine=scrfd_2.5g_bnkps_shape640x640.trt \
  1. Verify the generated engine
polygraphy inspect model scrfd_2.5g_bnkps_shape640x640.trt --model-type=engine

Check whether the input and output names and shapes are correct

Binding Index: 0 (Input)  [Name: input.1]  | Shapes: min=(1, 3, 640, 640), opt=(1, 3, 640, 640), max=(1, 3, 640, 640)
Binding Index: 1 (Output) [Name: bbox_8]   | Shape: (1, 8, 80, 80)
Binding Index: 2 (Output) [Name: kps_8]    | Shape: (1, 20, 80, 80)
Binding Index: 3 (Output) [Name: score_8]  | Shape: (1, 2, 80, 80)
Binding Index: 4 (Output) [Name: bbox_16]  | Shape: (1, 8, 40, 40)
Binding Index: 5 (Output) [Name: kps_16]   | Shape: (1, 20, 40, 40)
Binding Index: 6 (Output) [Name: score_16] | Shape: (1, 2, 40, 40)
Binding Index: 7 (Output) [Name: bbox_32]  | Shape: (1, 8, 20, 20)
Binding Index: 8 (Output) [Name: kps_32]   | Shape: (1, 20, 20, 20)
Binding Index: 9 (Output) [Name: score_32] | Shape: (1, 2, 20, 20)

They should match with the defined names and shapes in scrfd_trt.h, as

const char* INPUT_BLOB_NAME = "input.1";

const char* OUTPUT_BBOX_8_BLOB_NAME = "bbox_8";
const char* OUTPUT_KPS_8_BLOB_NAME = "kps_8";
const char* OUTPUT_SCORE_8_BLOB_NAME = "score_8";

const char* OUTPUT_BBOX_16_BLOB_NAME = "bbox_16";
const char* OUTPUT_KPS_16_BLOB_NAME = "kps_16";
const char* OUTPUT_SCORE_16_BLOB_NAME = "score_16";

const char* OUTPUT_BBOX_32_BLOB_NAME = "bbox_32";
const char* OUTPUT_KPS_32_BLOB_NAME = "kps_32";
const char* OUTPUT_SCORE_32_BLOB_NAME = "score_32";


const int INPUT_H = 640;
const int INPUT_W = 640;
const int INPUT_SIZE = 3 * INPUT_W * INPUT_H;

const int OUTPUT_BBOX_8_SIZE = 8 * 80 * 80;
const int OUTPUT_KPS_8_SIZE = 20 * 80 * 80;
const int OUTPUT_SCORE_8_SIZE = 2 * 80 * 80;

const int OUTPUT_BBOX_16_SIZE = 8 * 40 * 40;
const int OUTPUT_KPS_16_SIZE = 20 * 40 * 40;
const int OUTPUT_SCORE_16_SIZE = 2 * 40 * 40;

const int OUTPUT_BBOX_32_SIZE = 8 * 20 * 20;
const int OUTPUT_KPS_32_SIZE = 20 * 20 * 20;
const int OUTPUT_SCORE_32_SIZE = 2 * 20 * 20;
  1. Build
mkdir build
cd build
cmake ..
make -j4
  1. Run inference by specify the engine and input image, for example
./build/scrfd models/scrfd_2.5g_bnkps_shape640x640.trt test_images/worlds-largest-selfie.jpg 


Sample output images from scrfd_2.5g_bnkps model

Fps measurement

Inference multiple times for measuring the fps, for example

./build/scrfd models/scrfd_2.5g_bnkps_shape640x640.trt test_images/worlds-largest-selfie.jpg 1000
