error:batchedNMSPlugin.cpp

Question

error:batchedNMSPlugin.cpp

Opened this issue 4 years ago · 7 comments

hi,I met the problem:
#assertion/amirstan_plugin/src/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp,143
Aborted (core dumped)

enviroment:

OS: [Ubuntu]
python_version: [3.6]
pytorch_version: [1.6]
cuda_version: [cuda-10.2]
cudnn_version: [7.6.5]
mmdetection_version: [2.4]

Looking forward to your help~~ thankyou

Answer 1 · 2020-09-22T11:30:26.000Z

Hi,
Could you provide the script,model file and test image data?

Answer 2 · 2020-09-22T11:39:37.000Z

scrip:[https://github.com/grimoire/mmdetection-to-tensorrt/blob/master/demo/inference.py]
model :[retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth（http://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth）]
config:[retinanet_r50_fpn_1x_coco.py（https://github.com/open-mmlab/mmdetection/blob/master/configs/retinanet/retinanet_r50_fpn_1x_coco.py）]
image：

Answer 3 · 2020-09-22T12:19:22.000Z

Hi
I have test the image you provided. Seems convertor works.

here is my test script:

python demo/inference.py \
   test.jpg \
   retinanet_r50_fpn_1x_coco.py \
   retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth \
   retina.pth

And result:

Could you provide more detail about how to reproduce the error? Such as the gpu device type, the argument you send to the script or anything might related.

Answer 4 · 2020-10-14T10:28:13.000Z

Hi
I have test the image you provided. Seems convertor works.

here is my test script:
python demo/inference.py \
   test.jpg \
   retinanet_r50_fpn_1x_coco.py \
   retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth \
   retina.pth 
And result:

Could you provide more detail about how to reproduce the error? Such as the gpu device type, the argument you send to the script or anything might related.

hi,sorry bother you again!!
Now,I have runned it ,but the results is wrong. like this:
[tensor([[-1070399664]], device='cuda:0', dtype=torch.int32), tensor([[[ 26.3982, -17.1053, 81.1487, 17.1053],
[ 19.2828, -21.5513, 88.2641, 21.5513],
[ 38.4096, -19.2000, 69.1373, 19.2000],
[ 34.4162, -24.1905, 73.1307, 24.1905],
[ 29.3849, -30.4781, 78.1620, 30.4781],
[ 42.9096, -27.1529, 64.6373, 27.1529],
[ 40.0858, -34.2105, 67.4611, 34.2105],
[ 36.5281, -43.1025, 71.0188, 43.1025],
[ 39.7276, -13.5765, 83.1831, 13.5765],
[ 34.0801, -17.1053, 88.8306, 17.1053],
[ 26.9647, -21.5513, 95.9460, 21.5513],
[ 46.0915, -19.2000, 76.8192, 19.2000],
[ 42.0981, -24.1905, 80.8126, 24.1905],
[ 37.0668, -30.4781, 85.8439, 30.4781],
[ 50.5915, -27.1529, 72.3192, 27.1529],
[ 47.7677, -34.2105, 75.1430, 34.2105],
[ 44.2100, -43.1025, 78.7007, 43.1025],
[ 47.4095, -13.5765, 90.8650, 13.5765],
[ 41.7620, -17.1053, 96.5125, 17.1053],
[ 34.6466, -21.5513, 103.6279, 21.5513],
[ 53.7734, -19.2000, 84.5011, 19.2000],
[ 49.7801, -24.1905, 88.4945, 24.1905],
[ 44.7487, -30.4781, 93.5259, 30.4781],
[ 58.2734, -27.1529, 80.0012, 27.1529],
[ 55.4497, -34.2105, 82.8249, 34.2105],
[ 51.8920, -43.1025, 86.3826, 43.1025],
[ 55.0914, -13.5765, 98.5470, 13.5765],
[ 49.4440, -17.1053, 104.1945, 17.1053],
[ 42.3285, -21.5513, 111.3099, 21.5513],
[ 61.4554, -19.2000, 92.1830, 19.2000],
[ 57.4620, -24.1905, 96.1764, 24.1905],
[ 52.4306, -30.4781, 101.2078, 30.4781],
[ 65.9553, -27.1529, 87.6831, 27.1529],
[ 63.1316, -34.2105, 90.5068, 34.2105],
[ 59.5739, -43.1025, 94.0645, 43.1025],
[ 62.7734, -13.5765, 106.2289, 13.5765],
[ 57.1259, -17.1053, 111.8764, 17.1053],
[ 50.0105, -21.5513, 118.9918, 21.5513],
[ 69.1373, -19.2000, 99.8650, 19.2000],
[ 65.1439, -24.1905, 103.8584, 24.1905],
[ 60.1125, -30.4781, 108.8897, 30.4781],
[ 73.6373, -27.1529, 95.3650, 27.1529],
[ 70.8135, -34.2105, 98.1888, 34.2105],
[ 67.2558, -43.1025, 101.7465, 43.1025],
[ 70.4553, -13.5765, 113.9108, 13.5765],
[ 64.8078, -17.1053, 119.5583, 17.1053],
[ 57.6924, -21.5513, 126.6737, 21.5513],
[ 76.8192, -19.2000, 107.5469, 19.2000],
[ 72.8258, -24.1905, 111.5403, 24.1905],
[ 67.7945, -30.4781, 116.5716, 30.4781],
[ 81.3192, -27.1529, 103.0469, 27.1529],
[ 78.4954, -34.2105, 105.8707, 34.2105],
[ 74.9377, -43.1025, 109.4284, 43.1025],
[ 78.1372, -13.5765, 121.5927, 13.5765],
[ 72.4897, -17.1053, 127.2402, 17.1053],
[ 65.3743, -21.5513, 134.3556, 21.5513],
[ 84.5011, -19.2000, 115.2288, 19.2000],
[ 80.5077, -24.1905, 119.2222, 24.1905],
[ 75.4764, -30.4781, 124.2535, 30.4781],
[ 89.0011, -27.1529, 110.7288, 27.1529],
[ 86.1773, -34.2105, 113.5526, 34.2105],
[ 82.6196, -43.1025, 117.1103, 43.1025],
[ 85.8191, -13.5765, 129.2746, 13.5765],
[ 80.1716, -17.1053, 134.9221, 17.1053],
[ 73.0562, -21.5513, 142.0376, 21.5513],
[ 92.1830, -19.2000, 122.9107, 19.2000],
[ 88.1897, -24.1905, 126.9041, 24.1905],
[ 83.1583, -30.4781, 131.9355, 30.4781],
[ 96.6830, -27.1529, 118.4108, 27.1529],
[ 93.8593, -34.2105, 121.2345, 34.2105],
[ 90.3016, -43.1025, 124.7922, 43.1025],
[ 93.5011, -13.5765, 136.9565, 13.5765],
[ 87.8536, -17.1053, 142.6040, 17.1053],
[ 80.7382, -21.5513, 149.7195, 21.5513],
[ 99.8650, -19.2000, 130.5927, 19.2000],
[ 95.8716, -24.1905, 134.5860, 24.1905],
[ 90.8402, -30.4781, 139.6174, 30.4781],
[104.3649, -27.1529, 126.0927, 27.1529],
[101.5412, -34.2105, 128.9164, 34.2105],
[ 97.9835, -43.1025, 132.4741, 43.1025],
[101.1830, -13.5765, 144.6385, 13.5765],
[ 95.5355, -17.1053, 150.2860, 17.1053],
[ 88.4201, -21.5513, 157.4014, 21.5513],
[107.5469, -19.2000, 138.2746, 19.2000],
[103.5535, -24.1905, 142.2679, 24.1905],
[ 98.5221, -30.4781, 147.2993, 30.4781],
[112.0469, -27.1529, 133.7746, 27.1529],
[109.2231, -34.2105, 136.5984, 34.2105],
[105.6654, -43.1025, 140.1561, 43.1025],
[108.8649, -13.5765, 152.3204, 13.5765],
[103.2174, -17.1053, 157.9679, 17.1053],
[ 96.1020, -21.5513, 165.0833, 21.5513],
[115.2288, -19.2000, 145.9565, 19.2000],
[111.2354, -24.1905, 149.9499, 24.1905],
[106.2041, -30.4781, 154.9812, 30.4781],
[119.7288, -27.1529, 141.4565, 27.1529],
[116.9050, -34.2105, 144.2803, 34.2105],
[113.3473, -43.1025, 147.8380, 43.1025],
[116.5468, -13.5765, 160.0023, 13.5765],
[110.8993, -17.1053, 165.6498, 17.1053]]], device='cuda:0'), tensor([[304.0000, -32.0000, 368.0000, 32.0000, 295.6825, -40.3175, 376.3175,
40.3175, 285.2032, -50.7968, 386.7968, 50.7968, 313.3726, -45.2548,
358.6274, 45.2548, 307.4912, -57.0175, 364.5088, 57.0175, 300.0812,
-71.8376, 371.9188, 71.8376, 306.7452, -22.6274, 397.2548, 22.6274,
294.9825, -28.5088, 409.0175, 28.5088, 280.1624, -35.9188, 423.8376,
35.9188, 320.0000, -32.0000, 384.0000, 32.0000, 311.6825, -40.3175,
392.3175, 40.3175, 301.2032, -50.7968, 402.7968, 50.7968, 329.3726,
-45.2548, 374.6274, 45.2548, 323.4912, -57.0175, 380.5088, 57.0175,
316.0812, -71.8376, 387.9188, 71.8376, 322.7452, -22.6274, 413.2548,
22.6274, 310.9825, -28.5088, 425.0175, 28.5088, 296.1624, -35.9188,
439.8376, 35.9188, 336.0000, -32.0000, 400.0000, 32.0000, 327.6825,
-40.3175, 408.3175, 40.3175, 317.2032, -50.7968, 418.7968, 50.7968,
345.3726, -45.2548, 390.6274, 45.2548, 339.4912, -57.0175, 396.5088,
57.0175, 332.0812, -71.8376, 403.9188, 71.8376, 338.7452, -22.6274,
429.2548, 22.6274]], device='cuda:0'), tensor([[348.0812, -71.8376, 419.9188, 71.8376, 354.7452, -22.6274, 445.2548,
22.6274, 342.9825, -28.5088, 457.0175, 28.5088, 328.1624, -35.9188,
471.8376, 35.9188, 368.0000, -32.0000, 432.0000, 32.0000, 359.6825,
-40.3175, 440.3175, 40.3175, 349.2032, -50.7968, 450.7968, 50.7968,
377.3726, -45.2548, 422.6274, 45.2548, 371.4912, -57.0175, 428.5088,
57.0175, 364.0812, -71.8376, 435.9188, 71.8376, 370.7452, -22.6274,
461.2548, 22.6274, 358.9825, -28.5088, 473.0175, 28.5088, 344.1624,
-35.9188, 487.8376, 35.9188, 384.0000, -32.0000, 448.0000, 32.0000,
375.6825, -40.3175, 456.3175, 40.3175, 365.2032, -50.7968, 466.7968,
50.7968, 393.3726, -45.2548, 438.6274, 45.2548, 387.4912, -57.0175,
444.5088, 57.0175, 380.0812, -71.8376, 451.9188, 71.8376, 386.7452,
-22.6274, 477.2548, 22.6274, 374.9825, -28.5088, 489.0175, 28.5088,
360.1624, -35.9188, 503.8376, 35.9188, 400.0000, -32.0000, 464.0000,
32.0000, 391.6825, -40.3175, 472.3175, 40.3175, 381.2032, -50.7968,
482.7968, 50.7968]], device='cuda:0')]

I debug into the torch2trt_dynamic.py, It likes that the " self.context.execute_async_v2(bindings,
torch.cuda.current_stream().cuda_stream)" doesn't work? Could you give some advices?? thank you soooo much!!

Answer 5 · 2020-10-14T10:30:04.000Z

and..., emm....,is it convenient to provide a dockerfile???

Answer 6 · 2020-10-14T12:00:29.000Z

execute_async_v2 is the inference entry of tensorrt. The error is happening inside the model.

The project has been changed a lot since my last reply, please reinstall torch2trt_dynamic, amirstan_plugin, mmdetection-to-tensorrt and try again.

If the error still exist. you can try create tensorrt model and wrap model(pytorch) like below, see if their result is different or not.

    trt_model, wrap_model = mmdet2trt(cfg_path, 
                                    model_path,
                                    opt_shape_param=opt_shape_param, 
                                    max_workspace_size=1<<32,
                                    trt_log_level="INFO",
                                    return_wrap_model=True,
                                    output_names=None)

modify anchor_head.py (assuming you are using retinanet, right?), address the layer which give you different results. I will see if I can do something.

Dockfile is on my TODO list, will be added in future.

Answer 7 · 2020-10-14T12:19:43.000Z

I will see if I can do something.

ok，I will try again~~