thaitc-hust/yolov9-tensorrt

added EfficientNMS (code)

Egorundel opened this issue ยท 3 comments

@thaitc-hust Hello! I edited your code (add_nms_plugins_custom.py) for EfficientNMS_TRT. It working on C++, I checked with myself.
And it seems that because of EfficientNMS, now the speed of creating the TensorRT engine is faster and the inference itself seems to be too.

add_nms_plugins_custom.py:

import torch 
import torch.nn as nn 
import onnx 
import numpy as np 
import onnx_graphsurgeon as gs 
import argparse
def create_attrs(keepTopK):
    attrs = {}
    attrs["background_class"] = -1
    attrs["box_coding"] = 0
    attrs["iou_threshold"] = 0.45
    attrs["max_output_boxes"] = keepTopK
    attrs["plugin_version"] = "1"
    attrs["score_activation"] = 0
    attrs["score_threshold"] = 0.25
    return attrs

def create_and_add_plugin_node(graph, keepTopK):
    
    batch_size = graph.inputs[0].shape[0]
    n_boxes = graph.inputs[0].shape[1]

    tensors = graph.tensors()
    boxes_tensor = tensors["bboxes"]
    confs_tensor = tensors["scores"]

    num_dets = gs.Variable(name="num_dets").to_variable(dtype=np.int32, shape=[-1, 1])
    det_boxes = gs.Variable(name="det_boxes").to_variable(dtype=np.float32, shape=[-1, keepTopK, 4])
    det_scores = gs.Variable(name="det_scores").to_variable(dtype=np.float32, shape=[-1, keepTopK])
    det_classes = gs.Variable(name="det_classes").to_variable(dtype=np.int32, shape=[-1, keepTopK])

    new_outputs = [num_dets, det_boxes, det_scores, det_classes]
    print(new_outputs)

    mns_node = gs.Node(
        op="EfficientNMS_TRT",
        attrs=create_attrs(keepTopK),
        inputs=[boxes_tensor, confs_tensor],
        outputs=new_outputs)

    graph.nodes.append(mns_node)
    graph.outputs = new_outputs

    return graph.cleanup().toposort()

def main():
    parser = argparse.ArgumentParser(description="Add batchedNMSPlugin")
    parser.add_argument("-f", "--model", help="Path to the ONNX model generated by export_model.py", default="yolov4_1_3_416_416.onnx")
    parser.add_argument("-k", "--keepTopK", help="bounding boxes to be kept per image", default=100)

    args, _ = parser.parse_known_args()

    graph = gs.import_onnx(onnx.load(args.model))
    
    graph = create_and_add_plugin_node(graph, int(args.keepTopK))
    
    onnx.save(gs.export_onnx(graph), args.model.replace('.onnx', '') + "-Enms.onnx")


if __name__ =="__main__":
    main()

@Egorundel awesome! i will try.

@Egorundel @thaitc-hust Thank you for the wonderful work! I was considering developing it myself, but I've already found more than half of the work done. I'm going to set up the scripts to add to Nvidia Triton Server and create an application using Nvidia DeepStream for testing. All I need to do now is gather all the code into a single command to generate the model End2End. Once it's ready, I'll come back here to share.

I've created a forked repository from the original, adding End-to-End support for ONNX export.
Check this out WongKinYiu/yolov9#130 (comment)