grimoire/mmdetection-to-tensorrt

error:batchedNMSPlugin.cpp

Opened this issue · 7 comments

hi,I met the problem:
#assertion/amirstan_plugin/src/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp,143
Aborted (core dumped)

image

enviroment:

  • OS: [Ubuntu]
  • python_version: [3.6]
  • pytorch_version: [1.6]
  • cuda_version: [cuda-10.2]
  • cudnn_version: [7.6.5]
  • mmdetection_version: [2.4]

Looking forward to your help~~ thankyou

Hi,
Could you provide the script,model file and test image data?

scrip:[https://github.com/grimoire/mmdetection-to-tensorrt/blob/master/demo/inference.py]
model :[retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth(http://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth)]
config:[retinanet_r50_fpn_1x_coco.py(https://github.com/open-mmlab/mmdetection/blob/master/configs/retinanet/retinanet_r50_fpn_1x_coco.py)]
image:
coco_person

Hi
I have test the image you provided. Seems convertor works.

here is my test script:

python demo/inference.py \
   test.jpg \
   retinanet_r50_fpn_1x_coco.py \
   retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth \
   retina.pth 

And result:
Screenshot from 2020-09-22 20-10-44

Could you provide more detail about how to reproduce the error? Such as the gpu device type, the argument you send to the script or anything might related.

Hi
I have test the image you provided. Seems convertor works.

here is my test script:

python demo/inference.py \
   test.jpg \
   retinanet_r50_fpn_1x_coco.py \
   retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth \
   retina.pth 

And result:
Screenshot from 2020-09-22 20-10-44

Could you provide more detail about how to reproduce the error? Such as the gpu device type, the argument you send to the script or anything might related.

hi,sorry bother you again!!
Now,I have runned it ,but the results is wrong. like this:
[tensor([[-1070399664]], device='cuda:0', dtype=torch.int32), tensor([[[ 26.3982, -17.1053, 81.1487, 17.1053],
[ 19.2828, -21.5513, 88.2641, 21.5513],
[ 38.4096, -19.2000, 69.1373, 19.2000],
[ 34.4162, -24.1905, 73.1307, 24.1905],
[ 29.3849, -30.4781, 78.1620, 30.4781],
[ 42.9096, -27.1529, 64.6373, 27.1529],
[ 40.0858, -34.2105, 67.4611, 34.2105],
[ 36.5281, -43.1025, 71.0188, 43.1025],
[ 39.7276, -13.5765, 83.1831, 13.5765],
[ 34.0801, -17.1053, 88.8306, 17.1053],
[ 26.9647, -21.5513, 95.9460, 21.5513],
[ 46.0915, -19.2000, 76.8192, 19.2000],
[ 42.0981, -24.1905, 80.8126, 24.1905],
[ 37.0668, -30.4781, 85.8439, 30.4781],
[ 50.5915, -27.1529, 72.3192, 27.1529],
[ 47.7677, -34.2105, 75.1430, 34.2105],
[ 44.2100, -43.1025, 78.7007, 43.1025],
[ 47.4095, -13.5765, 90.8650, 13.5765],
[ 41.7620, -17.1053, 96.5125, 17.1053],
[ 34.6466, -21.5513, 103.6279, 21.5513],
[ 53.7734, -19.2000, 84.5011, 19.2000],
[ 49.7801, -24.1905, 88.4945, 24.1905],
[ 44.7487, -30.4781, 93.5259, 30.4781],
[ 58.2734, -27.1529, 80.0012, 27.1529],
[ 55.4497, -34.2105, 82.8249, 34.2105],
[ 51.8920, -43.1025, 86.3826, 43.1025],
[ 55.0914, -13.5765, 98.5470, 13.5765],
[ 49.4440, -17.1053, 104.1945, 17.1053],
[ 42.3285, -21.5513, 111.3099, 21.5513],
[ 61.4554, -19.2000, 92.1830, 19.2000],
[ 57.4620, -24.1905, 96.1764, 24.1905],
[ 52.4306, -30.4781, 101.2078, 30.4781],
[ 65.9553, -27.1529, 87.6831, 27.1529],
[ 63.1316, -34.2105, 90.5068, 34.2105],
[ 59.5739, -43.1025, 94.0645, 43.1025],
[ 62.7734, -13.5765, 106.2289, 13.5765],
[ 57.1259, -17.1053, 111.8764, 17.1053],
[ 50.0105, -21.5513, 118.9918, 21.5513],
[ 69.1373, -19.2000, 99.8650, 19.2000],
[ 65.1439, -24.1905, 103.8584, 24.1905],
[ 60.1125, -30.4781, 108.8897, 30.4781],
[ 73.6373, -27.1529, 95.3650, 27.1529],
[ 70.8135, -34.2105, 98.1888, 34.2105],
[ 67.2558, -43.1025, 101.7465, 43.1025],
[ 70.4553, -13.5765, 113.9108, 13.5765],
[ 64.8078, -17.1053, 119.5583, 17.1053],
[ 57.6924, -21.5513, 126.6737, 21.5513],
[ 76.8192, -19.2000, 107.5469, 19.2000],
[ 72.8258, -24.1905, 111.5403, 24.1905],
[ 67.7945, -30.4781, 116.5716, 30.4781],
[ 81.3192, -27.1529, 103.0469, 27.1529],
[ 78.4954, -34.2105, 105.8707, 34.2105],
[ 74.9377, -43.1025, 109.4284, 43.1025],
[ 78.1372, -13.5765, 121.5927, 13.5765],
[ 72.4897, -17.1053, 127.2402, 17.1053],
[ 65.3743, -21.5513, 134.3556, 21.5513],
[ 84.5011, -19.2000, 115.2288, 19.2000],
[ 80.5077, -24.1905, 119.2222, 24.1905],
[ 75.4764, -30.4781, 124.2535, 30.4781],
[ 89.0011, -27.1529, 110.7288, 27.1529],
[ 86.1773, -34.2105, 113.5526, 34.2105],
[ 82.6196, -43.1025, 117.1103, 43.1025],
[ 85.8191, -13.5765, 129.2746, 13.5765],
[ 80.1716, -17.1053, 134.9221, 17.1053],
[ 73.0562, -21.5513, 142.0376, 21.5513],
[ 92.1830, -19.2000, 122.9107, 19.2000],
[ 88.1897, -24.1905, 126.9041, 24.1905],
[ 83.1583, -30.4781, 131.9355, 30.4781],
[ 96.6830, -27.1529, 118.4108, 27.1529],
[ 93.8593, -34.2105, 121.2345, 34.2105],
[ 90.3016, -43.1025, 124.7922, 43.1025],
[ 93.5011, -13.5765, 136.9565, 13.5765],
[ 87.8536, -17.1053, 142.6040, 17.1053],
[ 80.7382, -21.5513, 149.7195, 21.5513],
[ 99.8650, -19.2000, 130.5927, 19.2000],
[ 95.8716, -24.1905, 134.5860, 24.1905],
[ 90.8402, -30.4781, 139.6174, 30.4781],
[104.3649, -27.1529, 126.0927, 27.1529],
[101.5412, -34.2105, 128.9164, 34.2105],
[ 97.9835, -43.1025, 132.4741, 43.1025],
[101.1830, -13.5765, 144.6385, 13.5765],
[ 95.5355, -17.1053, 150.2860, 17.1053],
[ 88.4201, -21.5513, 157.4014, 21.5513],
[107.5469, -19.2000, 138.2746, 19.2000],
[103.5535, -24.1905, 142.2679, 24.1905],
[ 98.5221, -30.4781, 147.2993, 30.4781],
[112.0469, -27.1529, 133.7746, 27.1529],
[109.2231, -34.2105, 136.5984, 34.2105],
[105.6654, -43.1025, 140.1561, 43.1025],
[108.8649, -13.5765, 152.3204, 13.5765],
[103.2174, -17.1053, 157.9679, 17.1053],
[ 96.1020, -21.5513, 165.0833, 21.5513],
[115.2288, -19.2000, 145.9565, 19.2000],
[111.2354, -24.1905, 149.9499, 24.1905],
[106.2041, -30.4781, 154.9812, 30.4781],
[119.7288, -27.1529, 141.4565, 27.1529],
[116.9050, -34.2105, 144.2803, 34.2105],
[113.3473, -43.1025, 147.8380, 43.1025],
[116.5468, -13.5765, 160.0023, 13.5765],
[110.8993, -17.1053, 165.6498, 17.1053]]], device='cuda:0'), tensor([[304.0000, -32.0000, 368.0000, 32.0000, 295.6825, -40.3175, 376.3175,
40.3175, 285.2032, -50.7968, 386.7968, 50.7968, 313.3726, -45.2548,
358.6274, 45.2548, 307.4912, -57.0175, 364.5088, 57.0175, 300.0812,
-71.8376, 371.9188, 71.8376, 306.7452, -22.6274, 397.2548, 22.6274,
294.9825, -28.5088, 409.0175, 28.5088, 280.1624, -35.9188, 423.8376,
35.9188, 320.0000, -32.0000, 384.0000, 32.0000, 311.6825, -40.3175,
392.3175, 40.3175, 301.2032, -50.7968, 402.7968, 50.7968, 329.3726,
-45.2548, 374.6274, 45.2548, 323.4912, -57.0175, 380.5088, 57.0175,
316.0812, -71.8376, 387.9188, 71.8376, 322.7452, -22.6274, 413.2548,
22.6274, 310.9825, -28.5088, 425.0175, 28.5088, 296.1624, -35.9188,
439.8376, 35.9188, 336.0000, -32.0000, 400.0000, 32.0000, 327.6825,
-40.3175, 408.3175, 40.3175, 317.2032, -50.7968, 418.7968, 50.7968,
345.3726, -45.2548, 390.6274, 45.2548, 339.4912, -57.0175, 396.5088,
57.0175, 332.0812, -71.8376, 403.9188, 71.8376, 338.7452, -22.6274,
429.2548, 22.6274]], device='cuda:0'), tensor([[348.0812, -71.8376, 419.9188, 71.8376, 354.7452, -22.6274, 445.2548,
22.6274, 342.9825, -28.5088, 457.0175, 28.5088, 328.1624, -35.9188,
471.8376, 35.9188, 368.0000, -32.0000, 432.0000, 32.0000, 359.6825,
-40.3175, 440.3175, 40.3175, 349.2032, -50.7968, 450.7968, 50.7968,
377.3726, -45.2548, 422.6274, 45.2548, 371.4912, -57.0175, 428.5088,
57.0175, 364.0812, -71.8376, 435.9188, 71.8376, 370.7452, -22.6274,
461.2548, 22.6274, 358.9825, -28.5088, 473.0175, 28.5088, 344.1624,
-35.9188, 487.8376, 35.9188, 384.0000, -32.0000, 448.0000, 32.0000,
375.6825, -40.3175, 456.3175, 40.3175, 365.2032, -50.7968, 466.7968,
50.7968, 393.3726, -45.2548, 438.6274, 45.2548, 387.4912, -57.0175,
444.5088, 57.0175, 380.0812, -71.8376, 451.9188, 71.8376, 386.7452,
-22.6274, 477.2548, 22.6274, 374.9825, -28.5088, 489.0175, 28.5088,
360.1624, -35.9188, 503.8376, 35.9188, 400.0000, -32.0000, 464.0000,
32.0000, 391.6825, -40.3175, 472.3175, 40.3175, 381.2032, -50.7968,
482.7968, 50.7968]], device='cuda:0')]

I debug into the torch2trt_dynamic.py, It likes that the " self.context.execute_async_v2(bindings,
torch.cuda.current_stream().cuda_stream)" doesn't work? Could you give some advices?? thank you soooo much!!

image

and..., emm....,is it convenient to provide a dockerfile???

execute_async_v2 is the inference entry of tensorrt. The error is happening inside the model.

The project has been changed a lot since my last reply, please reinstall torch2trt_dynamic, amirstan_plugin, mmdetection-to-tensorrt and try again.

If the error still exist. you can try create tensorrt model and wrap model(pytorch) like below, see if their result is different or not.

    trt_model, wrap_model = mmdet2trt(cfg_path, 
                                    model_path,
                                    opt_shape_param=opt_shape_param, 
                                    max_workspace_size=1<<32,
                                    trt_log_level="INFO",
                                    return_wrap_model=True,
                                    output_names=None)

modify anchor_head.py (assuming you are using retinanet, right?), address the layer which give you different results. I will see if I can do something.

Dockfile is on my TODO list, will be added in future.

I will see if I can do something.

ok,I will try again~~