
Whether the output of the model is the result before the NMS?

Thank you for your great work. But I have some problems when I conver the mmdetection to the torch model or the tensorrt model.
Whether the output of the model is the result before the NMS? I obtained the result with 100-dimensional when I print "inference_detector(trt_model, image_path, cfg_path, args.device)". And the result is different from the processing result of the mmdetection model.
And I test the torch_model and the tensorrt model with the parameters of return_wrap_model=True. The output of the tow models is same, but they are both tensor with 100-dimensional which is much more than my ground-truth.

The output has been through nms layer. Invalid boxes will be filled with 0 and cls_id will be filled with -1.

But all the value of my results are valid. Is the post-processing method the same as mmdet?

Possibly stupid question but I'd like to be sure if NMS is applied in converted .engine (I've converted GFL model to .engine and using it in C++). Or should I add that post-processing step (NMS) right after tensorrt->detect call?
If it's applied, then my next question. I saw mmcv has a flag which turns on cross-class NMS. If I use it will NMS within tensorrt .engine be also cross-class?


@vedrusss NMS has been include in the converted model. And it is cross-class by default. Read PyTorch implement and TensorRT converter for detail.
Actually, the NMS TensorRT implement here is modified from Nvidia's official plugin batchedNMSPlugin.

Hi @grimoire , from your PyTorch implement I can see NMS is applied for each class separately (done within for cls_idx in range(scores.shape[2]) cycle) and then nmsed results of each class are stacked into final results. Am I right? If yes, is there are way to do it for all labels together (cross-label)?

@vedrusss yes, It works just like what you understand.
And I think there is no way to do the cross-label nms for now. You might need to add another one class nms after the inference.

@grimoire , I've reviewed Nvidia's BatchedNMSPlugin and it looks like parameter shareLocation can be used to force cross-label NMS ("If set to true, the boxes input are shared across all classes. If set to false, the boxes input should account for per-class box data.")
So, looks like I could obtain TRT .engine with cross-label NMS layer without re-implementation of your BatchedNMS module (of course in pytorch mmdetection original model will do per-class NMS in such case).
I guess I must investigate around scores and boxes passed to the BatchedNMS forward - which dimension contains what. Because according to convert_batchednms the flag of shareLocation is defined from the boxes (shareLocation = (bboxes.shape[2] == 1)

@vedrusss I think the flag shareLocation share the boxes between different classes, but each classes are still processed seprately. Read the cuda kernel here.
The block number is const int GS = num_classes;, which means NMS of different classes are processed in different cuda block. shareLocation is used to control the offset of boxes.

is the input format of batchedNMSPlugin is the same as TensorRT repo (

@twmht Yes, actually it is modified from the official one.