fabio-sim/LightGlue-ONNX

Build Superpoint onnx file to Tensorrt engine failed, encounter Error (Could not find any implementation for node {ForeignNode[/Flatten.../Transpose_3]}

weihaoysgs opened this issue · 5 comments

@fabio-sim Hi, now i want convert the superpoint onnx file to tensorrt engine, but when i build it according the c++ interface, i encounter the following problem.

[12/16/2023-15:58:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 19, GPU 2905 (MiB)
[12/16/2023-15:58:39] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1450, GPU +242, now: CPU 1545, GPU 3133 (MiB)
[12/16/2023-15:58:39] [I] [TRT] ----------------------------------------------------------------
[12/16/2023-15:58:39] [I] [TRT] Input filename:   /home/weihao/workspace/lightglue_ws/LightGlue-ONNX/weights/superpoint.onnx
[12/16/2023-15:58:39] [I] [TRT] ONNX IR version:  0.0.8
[12/16/2023-15:58:39] [I] [TRT] Opset version:    17
[12/16/2023-15:58:39] [I] [TRT] Producer name:    pytorch
[12/16/2023-15:58:39] [I] [TRT] Producer version: 2.1.0
[12/16/2023-15:58:39] [I] [TRT] Domain:           
[12/16/2023-15:58:39] [I] [TRT] Model version:    0
[12/16/2023-15:58:39] [I] [TRT] Doc string:       
[12/16/2023-15:58:39] [I] [TRT] ----------------------------------------------------------------
[12/16/2023-15:58:39] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/16/2023-15:58:39] [I] [TRT] Graph optimization time: 0.00282692 seconds.
[12/16/2023-15:58:39] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[12/16/2023-15:58:50] [E] [TRT] 10: Could not find any implementation for node {ForeignNode[/Flatten.../Transpose_3]}.
[12/16/2023-15:58:50] [E] [TRT] 10: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/Flatten.../Transpose_3]}.)

the error is

[12/16/2023-15:58:50] [E] [TRT] 10: Could not find any implementation for node {ForeignNode[/Flatten.../Transpose_3]}.
[12/16/2023-15:58:50] [E] [TRT] 10: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/Flatten.../Transpose_3]}.)

maybe some torch operation is not support in the tensorrt ?

Also, i have try the following superpoint network definiation which can be foun at superpoing.py.

i have convert the onnx file to engine file success, but the post process and return value is a littile differ.

Hi @weihaoysgs

I haven't seen that error before. I looked through the ONNX graph using netron, but I couldn't find any node named ForeignNode[/Flatten.../Transpose_3]. Are you using this model?: https://github.com/fabio-sim/LightGlue-ONNX/releases/download/v1.0.0/superpoint.onnx

Hi @fabio-sim thank you for your replay!
I also test the onnx file you have provided, but get the save error. and I also want to find the {ForeignNode[/Flatten.../Transpose_3]} node on the netron, but may be only the following result?


but i have found that maybe torch.where() operation is not support in tensorrt engine?

when i update the code of superppoint post process

# Extract keypoints
# best_kp = torch.where(scores > self.conf["detection_threshold"])
# scores = scores[best_kp]
scores = scores

# keypoints = torch.stack(best_kp[1:3], dim=-1)
keypoints = torch.rand((10, 2))

Of course, this part of the change completely caused the program to be abnormal, but it can be verified that it may be a problem with the torch.where() operation, because the exported onnx file was converted successfully at this time.

So I wonder if there are other ways to achieve the same effect as torch.where while being supported by tensorrt?

I'm not too familiar about torch.where under TensorRT, but in that context where it's used to index into another tensor, it's usually converted into a combination of NonZero and Gather ops under ONNX.

I've tested that this model can be converted into an engine using the following code:

import tensorrt as trt  # >= 8.6.1

def build_engine(
    model_path: str = "weights/superpoint.onnx",
    output_path: str = "weights/superpoint.engine",
):
    logger = trt.Logger(trt.Logger.WARNING)

    builder = trt.Builder(logger)

    network = builder.create_network(
        1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    )

    parser = trt.OnnxParser(network, logger)

    success = parser.parse_from_file(model_path)
    for idx in range(parser.num_errors):
        print(parser.get_error(idx))

    if not success:
        raise Exception

    config = builder.create_builder_config()

    profile = builder.create_optimization_profile()

    for name in ["image"]:
        profile.set_shape(
            name,
            (1, 1, 128, 128),
            (1, 1, 256, 256),
            (1, 1, 512, 512),
        )

    config.add_optimization_profile(profile)

    serialized_engine = builder.build_serialized_network(network, config)

    with open(output_path, "wb") as f:
        f.write(serialized_engine)

So it could also have something to do with the TensorRT C++ vs Python APIs.

@fabio-sim Hi, thank you for your reply, I will verify it and get back to you as soon as possible.

@fabio-sim Hi, Thank you for your suggestion, I have solve this problem by update the superpoint model defination. I separated out the parts that require network reasoning. Now the reasoning part of superpoint is like this

  def forward(self, data):
      """Compute keypoints, scores, descriptors for image"""
      image = data
      # Shared Encoder
      x = self.relu(self.conv1a(image))
      x = self.relu(self.conv1b(x))
      x = self.pool(x)
      x = self.relu(self.conv2a(x))
      x = self.relu(self.conv2b(x))
      x = self.pool(x)
      x = self.relu(self.conv3a(x))
      x = self.relu(self.conv3b(x))
      x = self.pool(x)
      x = self.relu(self.conv4a(x))
      x = self.relu(self.conv4b(x))

      # Compute the dense keypoint scores
      cPa = self.relu(self.convPa(x))
      scores = self.convPb(cPa)
      scores = torch.nn.functional.softmax(scores, 1)[:, :-1]
      b, _, h, w = scores.shape
      scores = scores.permute(0, 2, 3, 1).reshape(b, h, w, 8, 8)
      scores = scores.permute(0, 1, 3, 2, 4).reshape(b, h * 8, w * 8)
      scores = simple_nms(scores, default_conf["nms_radius"])

      # Compute the dense descriptors
      cDa = self.relu(self.convDa(x))
      descriptors = self.convDb(cDa)
      descriptors = torch.nn.functional.normalize(descriptors, p=2, dim=1)

      return scores, descriptors

Then the remaining post-processing part can be implemented using either python or c++. At the same time, the number of feature points and confidence level can be dynamically changed for the configuration parameters.

If there are any other questions, I will open a new issue.