thb1314/tensorrt-onnx-fasterrcnn-fpn-roialign

app_fasterrcnn

Closed this issue · 53 comments

parsing onnx error rpn_backbone_resnet50:
(nbSpatialDims == kernelWeights.shape.nbDims - 2) && "The number of spatial dimensions and the kernel shape doesn't match up for the Conv operator."

@rasbery1
trt的版本是多少啊?我这边有人遇到的问题是trt版本小于8.0.3.4大于8.0.0.0会报错
trt版本7.2可以
8.0.3.4以上版本也可以,仅供参考

So the tensorrt should be 7.2 or greater than 8.0.3.4, right?

So the tensorrt should be 7.2 or greater than 8.0.3.4, right?

Yes, and you can check the onnx node name replacing operation in Step 7(see readme)

@thb1314 :step7 is related to exporting the header part and the name nodes are changed respectively. The problem is related to parsing rpn_backbone_resnet50.onnx. The error occurs while importing the following node. Is this still related to the version of TensorRT?
input: "rpn.head.conv.weight"
input: "rpn.head.conv.bias"
input: "feature_0"
output: "1388"
name: "Conv_243"
op_type: "Conv"

@rasbery1 Could you give more information about the error? Furthermore, you can check the output in step1-step8 one by one.

@thb1314 The complete error is:
While parsing node number 240 [Conv -> "1388"]:
[error][trt_builder.cpp:30]:NVInfer: tensorrt_code/src/tensorRT/onnx_parser/ModelImporter.cpp:737: --- Begin node ---
[error][trt_builder.cpp:30]:NVInfer: tensorrt_code/src/tensorRT/onnx_parser/ModelImporter.cpp:738: input: "rpn.head.conv.weight"
input: "rpn.head.conv.bias"
input: "feature_0"
output: "1388"
name: "Conv_243"
op_type: "Conv"
attribute {
name: "dilations"
ints: 1
ints: 1
type: INTS
}
attribute {
name: "group"
i: 1
type: INT
}
attribute {
name: "kernel_shape"
ints: 3
ints: 3
type: INTS
}
attribute {
name: "pads"
ints: 1
ints: 1
ints: 1
ints: 1
type: INTS
}
attribute {
name: "strides"
ints: 1
ints: 1
type: INTS
}

[error][trt_builder.cpp:30]:NVInfer: tensorrt_code/src/tensorRT/onnx_parser/ModelImporter.cpp:739: --- End node ---
[error][trt_builder.cpp:30]:NVInfer: tensorrt_code/src/tensorRT/onnx_parser/ModelImporter.cpp:741: ERROR: tensorrt_code/src/tensorRT/onnx_parser/builtin_op_importers.cpp:624 In function importConv:
[8] Assertion failed: (nbSpatialDims == kernelWeights.shape.nbDims - 2) && "The number of spatial dimensions and the kernel shape doesn't match up for the Conv operator."

@rasbery1 OK,Could you share your generated onnx file using GoogleDisk or BaiduDisk? I will check the onnx file in netron.

@rasbery1 You can also check the network structure in netron and make sure Step 4 and Step 6 correct.

image

@rasbery1 As this picture shows, your onnx file is wrong. Because the "X" in the "conv node" can not be "Initializer" and the "B" in the "conv node" which means bias of conv operation can not be "feature_0", the two items should switch positions.
Maybe it is a bug of onnxsim. You can try this pip install onnx-simplifier==0.3.6, and execute Step1->Step8 one by one again,

ok, thanks. Just wondering why the graph in onnx looks very complicated and what onnx simplifier does.

ok, thanks. Just wondering why the graph in onnx looks very complicated and what onnx simplifier does.

onnx simplifier is used to merge some redundant op in onnx. For more infomation, see https://github.com/daquexian/onnx-simplifier.

@thb1314 I'm making the executable "pro" again, and get this error:usr/bin/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status
Any idea?

@rasbery1 edit the cuda path in CMake config file

@thb1314 I edited the CMake file but I'm getting this error:
/usr/bin/ld: cannot find -lcublas

@thb1314 I edited the CMake file but I'm getting this error: /usr/bin/ld: cannot find -lcublas

Make sure cudnn and cudnn patch(if exist) is installed and the cudnn path in CMake config file is correct.

Just wondering if the level in "rpn_boxes" onnx output node represents the level of feature map corresponding to each bounding boxes before NMS.

And does the x06reduceRpnOnnx script remove the ROI pooling layer from RPN part?

Just wondering if the level in "rpn_boxes" onnx output node represents the level of feature map corresponding to each bounding boxes before NMS.

And does the x06reduceRpnOnnx script remove the ROI pooling layer from RPN part?

The level in "rpn_boxes" onnx output node represents the level of currtent bounding boxes in FPN. Suppose the level=i(i = 0,1,2,3), it means currtent bounding boxes is calculated from i-th lelvel anchor box in FPN

x06reduceRpnOnnx script remove the first ouput of FPN which is used for traing.

@rasbery1 Execute me, may I know where you are from?

If the level of the the rpn_boxes is passed, is level assignmnet for ROI_align is done in cuda code. If so, why is that?
I see that it is done in fasterrcnn_decode.cu. I just want to know the reason
float area = width * height;
int fpn_lvl = floorf(4 + log2(sqrt(area) / 224) + 1e-6) - 2;

    fpn_lvl = fpn_lvl > 3 ? 3 : fpn_lvl;
    fpn_lvl = fpn_lvl < 0 ? 0 : fpn_lvl;

image

@rasbery1 As this picture shows, your onnx file is wrong. Because the "X" in the "conv node" can not be "Initializer" and the "B" in the "conv node" which means bias of conv operation can not be "feature_0", the two items should switch positions. Maybe it is a bug of onnxsim. You can try this pip install onnx-simplifier==0.3.6, and execute Step1->Step8 one by one again,

@thb1314 Regarding to this issue, I had to change check_n to from 0 t 1 in the following command to get the correct node:
model_simp, check = onnxsim.simplify(model, check_n=1,input_shapes={'input':[1,3,input_height,input_width]},
dynamic_input_shape=False)

Now rpn_backbone_resnet50 is imported but I'm getting another error:
Compile FP32 Onnx Model 'rpn_backbone_resnet50.onnx'.
[info][trt_builder.cpp:557]:Input shape is -1 x 3 x 608 x 800
[info][trt_builder.cpp:558]:Set max batch size = 1
[info][trt_builder.cpp:559]:Set max workspace size = 1024.00 MB
[info][trt_builder.cpp:562]:Network has 1 inputs:
[info][trt_builder.cpp:568]: 0.[input] shape is -1 x 3 x 608 x 800
[info][trt_builder.cpp:574]:Network has 6 outputs:
[info][trt_builder.cpp:579]: 0.[rpn_boxes] shape is 1 x 4390 x 6
[info][trt_builder.cpp:579]: 1.[feature_0] shape is -1 x 256 x 152 x 200
[info][trt_builder.cpp:579]: 2.[feature_1] shape is -1 x 256 x 76 x 100
[info][trt_builder.cpp:579]: 3.[feature_2] shape is -1 x 256 x 38 x 50
[info][trt_builder.cpp:579]: 4.[feature_3] shape is -1 x 256 x 19 x 25
[info][trt_builder.cpp:579]: 5.[feature_pool] shape is -1 x 256 x 10 x 13
[info][trt_builder.cpp:583]:Network has 572 layers:
[info][trt_builder.cpp:650]:Building engine...
[warn][trt_builder.cpp:33]:NVInfer: Detected invalid timing cache, setup a local cache instead
[warn][trt_builder.cpp:33]:NVInfer: GPU error during getBestTactic: Gather_329 : invalid configuration argument
[trt_builder.cpp:30]:NVInfer: 10: [optimizer.cpp::computeCosts::1853] Error Code 10: Internal Error (Could not find any implementation for node Gather_329.)
[trt_builder.cpp:654]:engine is nullptr
[warn][trt_builder.cpp:33]:NVInfer: The logger passed into createInferRuntime differs from one already assigned, 0x55bd42639d00, logger not updated.

[error][fasterrcnn.cpp:164]:Engine rpn_backbone_resnet50.FP32.trtmodel load failed
[error][app_fasterrcnn.cpp:44]:Engine is nullptr

This is the link to the onnx file. Could you please help me with this issue?
https://drive.google.com/file/d/14INj1ut3di6kuEKtbAN5iKYuKGGdQFwW/view?usp=sharing

@rasbery1 Execute me, may I know where you are from?
Armenia

I appreciate if you can help me with the mentioned issue

@rasbery1 fpn_lvl is used to select the level of output features of RPN, which is passed to Header. level in the code indicates the level of anchor predefined in FPN, which can be viewed as a "class" of the bbox, and it is used to NMS in RPN.

image
@rasbery1 As this picture shows, your onnx file is wrong. Because the "X" in the "conv node" can not be "Initializer" and the "B" in the "conv node" which means bias of conv operation can not be "feature_0", the two items should switch positions. Maybe it is a bug of onnxsim. You can try this pip install onnx-simplifier==0.3.6, and execute Step1->Step8 one by one again,

@thb1314 Regarding to this issue, I had to change check_n to from 0 t 1 in the following command to get the correct node: model_simp, check = onnxsim.simplify(model, check_n=1,input_shapes={'input':[1,3,input_height,input_width]}, dynamic_input_shape=False)

Now rpn_backbone_resnet50 is imported but I'm getting another error: Compile FP32 Onnx Model 'rpn_backbone_resnet50.onnx'. [info][trt_builder.cpp:557]:Input shape is -1 x 3 x 608 x 800 [info][trt_builder.cpp:558]:Set max batch size = 1 [info][trt_builder.cpp:559]:Set max workspace size = 1024.00 MB [info][trt_builder.cpp:562]:Network has 1 inputs: [info][trt_builder.cpp:568]: 0.[input] shape is -1 x 3 x 608 x 800 [info][trt_builder.cpp:574]:Network has 6 outputs: [info][trt_builder.cpp:579]: 0.[rpn_boxes] shape is 1 x 4390 x 6 [info][trt_builder.cpp:579]: 1.[feature_0] shape is -1 x 256 x 152 x 200 [info][trt_builder.cpp:579]: 2.[feature_1] shape is -1 x 256 x 76 x 100 [info][trt_builder.cpp:579]: 3.[feature_2] shape is -1 x 256 x 38 x 50 [info][trt_builder.cpp:579]: 4.[feature_3] shape is -1 x 256 x 19 x 25 [info][trt_builder.cpp:579]: 5.[feature_pool] shape is -1 x 256 x 10 x 13 [info][trt_builder.cpp:583]:Network has 572 layers: [info][trt_builder.cpp:650]:Building engine... [warn][trt_builder.cpp:33]:NVInfer: Detected invalid timing cache, setup a local cache instead [warn][trt_builder.cpp:33]:NVInfer: GPU error during getBestTactic: Gather_329 : invalid configuration argument [trt_builder.cpp:30]:NVInfer: 10: [optimizer.cpp::computeCosts::1853] Error Code 10: Internal Error (Could not find any implementation for node Gather_329.) [trt_builder.cpp:654]:engine is nullptr [warn][trt_builder.cpp:33]:NVInfer: The logger passed into createInferRuntime differs from one already assigned, 0x55bd42639d00, logger not updated.

[error][fasterrcnn.cpp:164]:Engine rpn_backbone_resnet50.FP32.trtmodel load failed [error][app_fasterrcnn.cpp:44]:Engine is nullptr

This is the link to the onnx file. Could you please help me with this issue? https://drive.google.com/file/d/14INj1ut3di6kuEKtbAN5iKYuKGGdQFwW/view?usp=sharing

feature_pool should be moved in the next step. Exec step1 to step8 one by one again? Furthermore, don't forget to delete the '*.trtmodel' in workspace dir if exists.

@thb1314 Thank you very much for your help. I ran the python scripts from step 1 to step 8, but I noticed this problem: I see that the rpn_backbone_resnet50 generated in step 3 looks good, i.e. X in conv node 243 is feature_0 and W and B are shown as weights and bias, but after executing step 6 that removes feature pool, I get the wrong rpn_backbone_resnet50.onnx file again as before if you remember (X is shown as weights and B becomes feature_0 in conv node). I'm not sure what the problem is. Maybe it is related to the command "graph.cleanup()" but not sure about it. Could you help me with it?

@thb1314 Thank you very much for your help. I ran the python scripts from step 1 to step 8, but I noticed this problem: I see that the rpn_backbone_resnet50 generated in step 3 looks good, i.e. X in conv node 243 is feature_0 and W and B are shown as weights and bias, but after executing step 6 that removes feature pool, I get the wrong rpn_backbone_resnet50.onnx file again as before if you remember (X is shown as weights and B becomes feature_0 in conv node). I'm not sure what the problem is. Maybe it is related to the command "graph.cleanup()" but not sure about it. Could you help me with it?

@rasbery1 I tested Step6 script in my python3.7 conda env from scratch and did't replicate the bug. Maybe you can create a clean env and try again.
See below for another solution
Use the code to replace Step6 script.

import onnx
import numpy as np

# def getElementByName(graph, )

def cutOnnx():
    onnx_save_path = "rpn_backbone_resnet50.onnx"
    onnx_model = onnx.load(onnx_save_path)
    removed_list = list()
    for item in onnx_model.graph.output:
        if item.name == "feature_pool":
            removed_list.append(item)
    for item in removed_list:
        onnx_model.graph.output.remove(item)
    print(onnx_model.graph.output)
  
    # remove feature pool
    onnx.save(onnx_model, onnx_save_path)

    


if __name__ == '__main__':
    cutOnnx()
def cutOnnx():
    onnx_save_path = "rpn_backbone_resnet50.onnx"
    onnx_model = onnx.load(onnx_save_path)
    removed_list = list()
    for item in onnx_model.graph.output:
        if item.name == "feature_pool":
            removed_list.append(item)
    for item in removed_list:
        onnx_model.graph.output.remove(item)
    print(onnx_model.graph.output)
  
    # remove feature pool
    onnx.save(onnx_model, onnx_save_path)

@thb1314 Thank you so much. I got the correct graph, but when creating the trt engine for rpn part I'm getting the following error. This does not happen when it it generating engine for new_header.onnx:
[warn][trt_builder.cpp:33]:NVInfer: Detected invalid timing cache, setup a local cache instead
[warn][trt_builder.cpp:33]:NVInfer: GPU error during getBestTactic: Gather_329 : invalid configuration argument
[error][trt_builder.cpp:30]:NVInfer: 10: [optimizer.cpp::computeCosts::1853] Error Code 10: Internal Error (Could not find any implementation for node Gather_329.)
[error][trt_builder.cpp:654]:engine is nullptr

@rasbery1 Which onnx file?
Gather op should not exist in onnx graph.
Execute step 5-8 and replace the node name in Step7 as illustrated in Step7

@rasbery1 Which onnx file? Gather op should not exist in onnx graph. Execute step 5-8 and replace the node name in Step7 as illustrated in Step7

This is related to rpn_backbone_resnet50.onnx which comes from x06reduceRpnOnnx.py. I'm using the script that you sent and it has gather nodes. Could you please help me how to remove them?:

def cutOnnx():
onnx_save_path = "rpn_backbone_resnet50.onnx"
onnx_model = onnx.load(onnx_save_path)
removed_list = list()
for item in onnx_model.graph.output:
if item.name == "feature_pool":
removed_list.append(item)
for item in removed_list:
onnx_model.graph.output.remove(item)
#print(onnx_model.graph.output)

# remove feature pool
onnx.save(onnx_model, onnx_save_path)    

if name == 'main':
cutOnnx()

@rasbery1 I am sorry to draw the wrong conclusion. The Gather op Parsing error may result from the onnxparser version. You need to verify your TRTVersion and onnxparser according to https://github.com/thb1314/tensorrt-onnx-fasterrcnn-fpn-roialign/tree/master/tensorrt_code#setup-and-configuration

@rasbery1 I have tested the onnx file rpn_backbone_resnet50.onnx in my TRT 8.0.3.4 env.

onnxparser
@thb1314 My TRT version is also is 8.0.3.4. For onnx-parser, I replaced onnx_parser_for_8.x/onnx_parser to src/tensorRT/onnx_parser according to these instructions https://github.com/thb1314/tensorrt-onnx-fasterrcnn-fpn-roialign/tree/master/tensorrt_code#setup-and-configuration but still, I'm getting error for "gather node" in rpn onnx file. Could please guide me on how to fix it?

@rasbery1 share your onnx file?

@thb1314 Just gave you the access

@thb1314 Could you check it now?

@rasbery1 Ok, I have got it.

@thb1314 Just wondering if you got any chance to look at it.

@rasbery1
I test you provided onnx file in my TensorRT environment and get the correct result.
Maybe you can check your protobuf version, protobuf version is 3.11.4 in myenvironment.

Thanks, I'll change the version of protobuf to see how it goes.

@thb1314 It was fixed. Thanks. The engines are generated now I'm getting these erros from fasterrcnn_decode.cu for inference. I appreciate it if you can guide me to fix this too:

[error][preprocess_kernel.cu:385]:launch failed: no kernel image is available for execution on the device
[error][preprocess_kernel.cu:385]:launch failed: no kernel image is available for execution on the device
[error][fasterrcnn_decode.cu:203]:launch failed: no kernel image is available for execution on the device
[error][fasterrcnn_decode.cu:207]:launch failed: no kernel image is available for execution on the device
[error][trt_tensor.cpp:224]:Offset location[0] >= bytes_[0], out of range
[error][trt_tensor.cpp:224]:Offset location[0] >= bytes_[0], out of range
[error][trt_builder.cpp:30]:NVInfer: 3: [executionContext.cpp::setBindingDimensions::970] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::970, condition: profileMinDims.d[i] <= dimensions.d[i]. Supplied binding dimension [0,4] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 5120, minimum dimension in profile is 1, but supplied dimension is 0.
)
[error][trt_builder.cpp:30]:NVInfer: 3: [executionContext.cpp::setBindingDimensions::970] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::970, condition: profileMinDims.d[i] <= dimensions.d[i]. Supplied binding dimension [0,256,7,7] for bindings[1] exceed min ~ max range at index 0, maximum dimension in profile is 5120, minimum dimension in profile is 1, but supplied dimension is 0.
)
[error][trt_builder.cpp:30]:NVInfer: 3: [executionContext.cpp::enqueueInternal::322] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr
)
[fatal][trt_infer.cpp:340]:execute fail, code 209[cudaErrorNoKernelImageForDevice], message no kernel image is available for execution on the device
[error][fasterrcnn_decode.cu:111]:launch failed: no kernel image is available for execution on the device
[error][fasterrcnn_decode.cu:115]:launch failed: no kernel image is available for execution on the device
[error][fasterrcnn_decode.cu:203]:launch failed: no kernel image is available for execution on the device
[error][fasterrcnn_decode.cu:207]:launch failed: no kernel image is available for execution on the device
[error][trt_tensor.cpp:224]:Offset location[0] >= bytes_[0], out of range
[error][trt_tensor.cpp:224]:Offset location[0] >= bytes_[0], out of range
[error][trt_builder.cpp:30]:NVInfer: 3: [executionContext.cpp::setBindingDimensions::970] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::970, condition: profileMinDims.d[i] <= dimensions.d[i]. Supplied binding dimension [0,4] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 5120, minimum dimension in profile is 1, but supplied dimension is 0.
)
[error][trt_builder.cpp:30]:NVInfer: 3: [executionContext.cpp::setBindingDimensions::970] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::970, condition: profileMinDims.d[i] <= dimensions.d[i]. Supplied binding dimension [0,256,7,7] for bindings[1] exceed min ~ max range at index 0, maximum dimension in profile is 5120, minimum dimension in profile is 1, but supplied dimension is 0.
)
[error][trt_builder.cpp:30]:NVInfer: 3: [executionContext.cpp::enqueueInternal::322] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr
)

@rasbery1 no kernel image is available for execution on the device, It seems that your trt version doesn't match you cuda version.
And is the last bug result from the protobuf version?

@thb1314 I used trtpy get-env and run the code on the downloaded env which is trt8cuda112cudnn8, not sure how to figure out the exact version of tensorrt8 is in the mentioned envs.

@rasbery1 The trtpy get-env command is the latest trtpy, maybe its version doesn't match the current onnxparser.
Use the instuction in https://github.com/thb1314/tensorrt-onnx-fasterrcnn-fpn-roialign/tree/master/tensorrt_code#setup-and-configuration is the best choice.

I will add support for the latest trtpy in the future.

Just wondering if level index is related to FPN level each box belongs to?
const int level_index = int(offset_bottom_rois[roi_cols - 2]);"

in tensorrt_code/src/application/app_fasterrcnn/fasterrcnn.cpp Line 287-299

for(int i = 0; i < count; ++i) {
      float* pbox  = parray + 1 + i * RPN_NUM_BOX_ELEMENT;
      int keepflag = pbox[6];
      if(keepflag == 1) {
          // left, top, right, bottom, score, level, keepflag, fpn_level, batch_index
          roi_align_inputs_cpu_ptr[roi_align_inputs_index++] = pbox[0];
          roi_align_inputs_cpu_ptr[roi_align_inputs_index++] = pbox[1];
          roi_align_inputs_cpu_ptr[roi_align_inputs_index++] = pbox[2];
          roi_align_inputs_cpu_ptr[roi_align_inputs_index++] = pbox[3];
          roi_align_inputs_cpu_ptr[roi_align_inputs_index++] = pbox[7];
          roi_align_inputs_cpu_ptr[roi_align_inputs_index++] = pbox[8];
      }
  }

roi_align_inputs has 6 columns corresponding to "left, top, right, bottom, fpn_level, batch_index".
level_index is fpn_level calculated by the area of bbox.

I see that the shape of proposals while inference is: proposals : shape {5120 x 4}. I was thinking that initially, the number of boxes is 4390 and after NMS and roi_alignment becomes 1000 which is the input of header part. Why the shape is not 1000 x 4 instead?

I see that the shape of proposals while inference is: proposals : shape {5120 x 4}. I was thinking that initially, the number of boxes is 4390 and after NMS and roi_alignment becomes 1000 which is the input of header part. Why the shape is not 1000 x 4 instead?

@rasbery1 The outputs of RPN network is dynamic due to NMS. "1000" is the man-made maximum number of bbox which can be mobified in code.

@rasbery1 no more question?

@thb1314 is the inference time around 38 ms? how is it compared to when we don't use tensorrt?

@rasbery1 As long as it works, you can compared the speed with pytorch python api or torchscript