/CRAFTcpp

C++ implementation for the Character-Region Awareness For Text detection (CRAFT) with TensorRT for accelerated inference;文字检测;中英文字幕检测;CUDA C++实现

Primary LanguageC++

C++ implementation of CRAFT text detector with TensorRT

CRAFT: Character-Region Awareness For Text detection | Paper | Official Pytorch code

Overview

This is a C++ implementation for the CRAFT text detector with TensorRT for accelerated inference. Compared to the official PyTorch implementation, it significantly improves text detection efficiency and facilitates deployment.

Upon testing, the inference speed on RTX 4090 is x12 faster than the original CRAFT-pytorch project.

In addition, I have also provided a Chinese and English video subtitle detection model fine-tuned using a custom dataset, which offers higher accuracy in subtitle detection.

Getting started

Requirements

  • gcc
  • CUDA
  • TensorRT

The environment we tested with is GCC 7.3.1 + CUDA 11.2 + TensorRT-8.5.3.1

Generate trt engine

  1. Download the .pth model and place it in the 'pretrained' directory.

    • Official pretrained model: craft_mlt_25k.pth

    • I used a custom dataset to fine-tune the Chinese and English subtitle detection model:epoch_91.pth

  2. Pth to Onnx

    cd engine_generation
    python torch2onnx.py --usefp16 --torch_path ../pretrained/craft_mlt_25k.pth
  3. Onnx to trt engine

    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/trt/lib
    make
    ./onnx2trt ../pretrained/craft_mlt_25k_fp16.onnx ../pretrained/craft_mlt_25k_fp16_dynamic_shape.cache

Make and run demo

  1. make

    cd src
    make
    cd ..
    make
  2. run demo

    (1) If the input file is in image format:

    ./test_img <engine_path> <input_path>

    example:

    ./test_img ./pretrained/craft_mlt_25k_fp16_dynamic_shape.cache ./images/subtitle2.png

    (2) If the input file is in YUV format:

    ./main <engine_path> <height> <width> <yuv_file_path>

    example:

    ./test_yuv ./pretrained/craft_mlt_25k_fp16_dynamic_shape.cache 2160 3840 ./images/test_3840x2160_nv12.yuv

Interface Specification

The following interface can be utilized to integrate into your own code.

  • Initialization and loading of the TRT engine

    void infer_init(int height, int width, const char* engine_path, float ratio)
    • height: Height of the video/image

    • width: Width of the video/image

    • engine_path: Path to the engine

    • ratio: Scaling ratio for the input image, ranging from (0, 1], typically taken as 0.5

  • Inference

    (1) If the input file is in image format:

    vector<int> infer_pipe_rgb(uint8_t *rgb)
    • rgb: The memory address of the RGB image stored in the form of planes. Returns a vector, sequentially storing x_min, x_max, y_min, y_max for each box.

    (2) If the input file is in YUV format:

    vector<int> infer_pipe(uint8_t **in_yuv, int format, int* line_size);
    • in_yuv: Memory address of NV12 Y and UV planes

    Returns a vector, sequentially storing x_min, x_max, y_min, y_max for each box.

  • Destruction

    void destroyObj()
    

    Call this function after all inferences are completed.