
C++ implementation of YOLO-NAS using DeepSparse

Primary LanguageC++MIT LicenseMIT

GitHub release (latest by date) Visual Studio CMake Licence


YOLO-NAS is a state-of-the-art object detector by Deci AI. This project implements the YOLO-NAS object detector in C++ with DeepSparse backend to speed up inference performance. DeepSparse is an inference runtime by Neural Magic that can greatly speed up inference performance on CPUs by leveraging sparsity.


  • Supports both image and video inference.
  • Faster CPU inference speeds.

Getting Started

The following instructions demonstrates how to build this project on a Linux system. Windows is currently not supported by the DeepSparse library.


  • CMake v3.8+ - found at https://cmake.org/

  • GCC/G++ compiler - found at https://gcc.gnu.org/

  • Python 3.8+ - Python is used to install the deepsparse library which is required for the build. Download here.

  • OpenCV v4.0+ - Download here.

  • DeepSparse v1.6.0+ - Download here.

Building the project

  1. Set the OpenCV_DIR environment variable to point to your ../../opencv/build directory (if not set).

  2. Run the following build commands: a. [Linux] Bash:

    cd <yolo-nas-deepsparse-cpp-directory>
    cmake -S. -Bbuild -DCMAKE_BUILD_TYPE=Release
    cd build
  3. The compiled executable will be in root folder of the build directory.


  1. Export the ONNX file:

    from super_gradients.training import models
    model = models.get("yolo_nas_s", pretrained_weights="coco")
    model.prep_model_for_conversion(input_size=(1, 3, 640, 640))
    models.convert_to_onnx(model=model, prep_model_for_conversion_kwargs={"input_size":(1, 3, 640, 640)}, out_path="yolo_nas_s.onnx")
  2. To run the inference, execute the following command:

    yolo-nas-deepsparse-cpp --model <ONNX_MODEL_PATH> [-i <IMAGE_PATH> | -v <VIDEO_PATH>] [--imgsz IMAGE_SIZE] [--gpu] [--iou-thresh IOU_THRESHOLD] [--score-thresh CONFIDENCE_THRESHOLD]


The following benchmarks were done on Google Colab using Intel® Xeon® Processor E5-2699 v4 @ 2.20GHz with 2 vCPUs.

Backend Latency FPS Implementation
PyTorch 867.02ms 1.15 Native (model.predict() in super_gradients)
ONNX C++ (via OpenCV DNN) 962.27ms 1.04 Hyuotu
ONNX Python 626.37ms 1.59 Hyuotu
OpenVINO C++ 628.04ms 1.59 Y-T-G
DeepSparse C++ 565.75ms 1.83 Y-T-G



Thanks to @Hyuto for his work on ONNX implementation of YOLO-NAS in C++ which was utilized in this project.


This project is licensed under the MIT License - see the LICENSE file for details. DeepSparse Community edition is only for evaluation, research, and non-production. See the DeepSparse Community License for more details.