Failed to test the end-to-end onnx model by trtexec

Hi @fabio-sim Im trying to test and convert the superpoint_lightglue_end2end_fused.onnx model into tensorrt engine with TensorRT-8.6.1.6, but it turns out that the operater MultiHeadAttention has not been registered. I have tried the tenssort oss but it didn't help. Could you help me solve this problem or provide a reasonable solution to covert it into engine?

[10/27/2023-09:38:30] [I] [TRT] No importer registered for op: MultiHeadAttention. Attempting to import as plugin.
[10/27/2023-09:38:30] [I] [TRT] Searching for plugin: MultiHeadAttention, plugin_version: 1, plugin_namespace: 
[10/27/2023-09:38:30] [E] [TRT] 3: getPluginCreator could not find plugin: MultiHeadAttention version: 1
[10/27/2023-09:38:30] [E] [TRT] parsers/onnx/ModelImporter.cpp:768: While parsing node number 509 [MultiHeadAttention -> "/lightglue/transformers.0/self_attn/Reshape_5_output_0"]:
[10/27/2023-09:38:30] [E] [TRT] parsers/onnx/ModelImporter.cpp:769: --- Begin node ---
[10/27/2023-09:38:30] [E] [TRT] parsers/onnx/ModelImporter.cpp:770: input: "Transpose_0_out"
output: "/lightglue/transformers.0/self_attn/Reshape_5_output_0"
name: "MultiHeadAttention_0"
op_type: "MultiHeadAttention"
attribute {
  name: "num_heads"
  i: 4
  type: INT
}
domain: "com.microsoft"

[10/27/2023-09:38:30] [E] [TRT] parsers/onnx/ModelImporter.cpp:771: --- End node ---
[10/27/2023-09:38:30] [E] [TRT] parsers/onnx/ModelImporter.cpp:773: ERROR: parsers/onnx/builtin_op_importers.cpp:5405 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[10/27/2023-09:38:30] [E] Failed to parse onnx file
[10/27/2023-09:38:30] [I] Finished parsing network model. Parse time: 0.0809233
[10/27/2023-09:38:30] [E] Parsing model failed
[10/27/2023-09:38:30] [E] Failed to create engine from model or file.
[10/27/2023-09:38:30] [E] Engine set up failed

hey, i try to use trtexec to convert superpoint_lightglue.onnx to .trt file,it showed success. but when i try to test .trt , it occured the error

i thought the error occurred during using trtexec.

hey, i try to use trtexec to convert superpoint_lightglue.onnx to .trt file,it showed success. but when i try to test .trt , it occured the error i thought the error occurred during using trtexec.

Yes. It seems that the superpoint_lightglue.onnx could pass the test, but after the optimization by optimize.py, turning it
into superpoint_lightglue_fused.onnx , some layers in the network has been summed up to a MultiHeadAttention layer, which is not supported in tensorrt now.

About your error here, I think that try to follow the error message to use set_input_shape instead of set_binding_shape may be useful?

hey, i try to use trtexec to convert superpoint_lightglue.onnx to .trt file,it showed success. but when i try to test .trt , it occured the error i thought the error occurred during using trtexec.

Yes. It seems that the superpoint_lightglue.onnx could pass the test, but after the optimization by optimize.py, turning it into superpoint_lightglue_fused.onnx , some layers in the network has been summed up to a MultiHeadAttention layer, which is not supported in tensorrt now.

About your error here, I think that try to follow the error message to use set_input_shape instead of set_binding_shape may be useful?

yes ,thanks, i use set_binding_shape to set the input shape, but it does not work when it is the output turn.
` inputs = [kpts0,kpts1,desc0,desc1]
self.bufferH = []
self.bufferD = []

    output_shape = [(num_matches,2), (num_matches,)]

    ####binding outputshape
    # output_matches_binding_idx = self.engine.get_binding_index("matches0")
    # output_scores_binding_idx = self.engine.get_binding_index("mscores0")
    # self.context.set_binding_shape(output_matches_binding_idx,tuple(output_shape[0]))
    # self.context.set_binding_shape(output_scores_binding_idx,tuple(output_shape[1]))

    for i in range(self.nInput):
        self.bufferH.append(np.ascontiguousarray(inputs[i]))  
        self.context.set_binding_shape(i, tuple(inputs[i].shape))`

Hi @BoyceL @Albert337, thank you for your interest in LightGlue-ONNX.

I haven't had a chance to look at using trtexec yet. I'll check it out in the coming weeks.

Hi @BoyceL @Albert337, thank you for your interest in LightGlue-ONNX.

I haven't had a chance to look at using trtexec yet. I'll check it out in the coming weeks.

Thanks. Take your time.

hey, i try to use trtexec to convert superpoint_lightglue.onnx to .trt file,it showed success. but when i try to test .trt , it occured the error i thought the error occurred during using trtexec.

Yes. It seems that the superpoint_lightglue.onnx could pass the test, but after the optimization by optimize.py, turning it into superpoint_lightglue_fused.onnx , some layers in the network has been summed up to a MultiHeadAttention layer, which is not supported in tensorrt now.
About your error here, I think that try to follow the error message to use set_input_shape instead of set_binding_shape may be useful?

yes ,thanks, i use set_binding_shape to set the input shape, but it does not work when it is the output turn. ` inputs = [kpts0,kpts1,desc0,desc1] self.bufferH = [] self.bufferD = []
    output_shape = [(num_matches,2), (num_matches,)]

    ####binding outputshape
    # output_matches_binding_idx = self.engine.get_binding_index("matches0")
    # output_scores_binding_idx = self.engine.get_binding_index("mscores0")
    # self.context.set_binding_shape(output_matches_binding_idx,tuple(output_shape[0]))
    # self.context.set_binding_shape(output_scores_binding_idx,tuple(output_shape[1]))

    for i in range(self.nInput):
        self.bufferH.append(np.ascontiguousarray(inputs[i]))  
        self.context.set_binding_shape(i, tuple(inputs[i].shape))`

Sorry that I have no clue about your error here, I'm just a learner in a few weeks :(

Hi, I've added the TRT engine-convertible ONNX model here. You can also refer to #56 and this script:

LightGlue-ONNX/trt_infer.py

Lines 1 to 122 in bcf96b7

    
           """Sample code to build and run LightGlue TensorRT engine inference.""" 
        
           import numpy as np 
        
           import tensorrt as trt  # >= 8.6.1 
        
           import torch 
        
           import trt_utils.common as common 
        
           from lightglue_onnx import SuperPoint 
        
           from lightglue_onnx.end2end import normalize_keypoints 
        
           from lightglue_onnx.utils import load_image, rgb_to_grayscale 
        
           def build_engine( 
        
               model_path: str, output_path: str, num_keypoints: int = 512, desc_dim: int = 256 
        
           ): 
        
               logger = trt.Logger(trt.Logger.WARNING) 
        
               builder = trt.Builder(logger) 
        
               network = builder.create_network( 
        
                   1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH) 
        
               ) 
        
               parser = trt.OnnxParser(network, logger) 
        
               success = parser.parse_from_file(model_path) 
        
               for idx in range(parser.num_errors): 
        
                   print(parser.get_error(idx)) 
        
               if not success: 
        
                   raise Exception 
        
               config = builder.create_builder_config() 
        
               profile = builder.create_optimization_profile() 
        
               for name in ["kpts0", "kpts1"]: 
        
                   profile.set_shape( 
        
                       name, 
        
                       (1, 32, 2), 
        
                       (1, num_keypoints // 2, 2), 
        
                       (1, num_keypoints, 2), 
        
                   ) 
        
               for name in ["desc0", "desc1"]: 
        
                   profile.set_shape( 
        
                       name, 
        
                       (1, 32, desc_dim), 
        
                       (1, num_keypoints // 2, desc_dim), 
        
                       (1, num_keypoints, desc_dim), 
        
                   ) 
        
               config.add_optimization_profile(profile) 
        
               serialized_engine = builder.build_serialized_network(network, config) 
        
               with open(output_path, "wb") as f: 
        
                   f.write(serialized_engine) 
        
           def load_inputs( 
        
               input_buffers, 
        
               img_size=512, 
        
               img0_path="assets/sacre_coeur1.jpg", 
        
               img1_path="assets/sacre_coeur2.jpg", 
        
               max_num_keypoints=512, 
        
           ): 
        
               image0, scales0 = load_image(img0_path, resize=img_size) 
        
               image1, scales1 = load_image(img1_path, resize=img_size) 
        
               image0 = rgb_to_grayscale(image0) 
        
               image1 = rgb_to_grayscale(image1) 
        
               extractor = SuperPoint(max_num_keypoints=max_num_keypoints).eval() 
        
               with torch.inference_mode(): 
        
                   feats0, feats1 = extractor(image0[None]), extractor(image1[None]) 
        
                   kpts0, scores0, desc0 = feats0 
        
                   kpts1, scores1, desc1 = feats1 
        
                   kpts0 = normalize_keypoints(kpts0, image0.shape[1], image0.shape[2]) 
        
                   kpts1 = normalize_keypoints(kpts1, image1.shape[1], image1.shape[2]) 
        
               for i, tensor in zip(input_buffers, [kpts0, kpts1, desc0, desc1]): 
        
                   np.copyto(i.host, tensor.numpy().ravel()) 
        
               return { 
        
                   "kpts0": kpts0.shape, 
        
                   "kpts1": kpts1.shape, 
        
                   "desc0": desc0.shape, 
        
                   "desc1": desc1.shape, 
        
               } 
        
           def run_engine(engine_path: str): 
        
               logger = trt.Logger(trt.Logger.WARNING) 
        
               with open(engine_path, "rb") as f: 
        
                   engine = trt.Runtime(logger).deserialize_cuda_engine(f.read()) 
        
               # TODO: Dynamic output shapes 
        
               inputs, outputs, bindings, stream = common.allocate_buffers(engine, profile_idx=0) 
        
               shapes = load_inputs(inputs) 
        
               context = engine.create_execution_context() 
        
               for name, shape in shapes.items(): 
        
                   context.set_input_shape(name, tuple(shape)) 
        
               trt_outputs = common.do_inference_v2( 
        
                   context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream 
        
               ) 
        
               matches0, mscores0 = trt_outputs 
        
               return matches0, mscores0 
        
           if __name__ == "__main__": 
        
               model_path = "weights/superpoint_lightglue.trt.onnx" 
        
               output_path = "weights/superpoint_lightglue.engine" 
        
               build_engine(model_path, output_path) 
        
               matches0, mscores0 = run_engine(output_path) 
        
               print(matches0.reshape(512, 2))

	"""Sample code to build and run LightGlue TensorRT engine inference."""
	import numpy as np
	import tensorrt as trt # >= 8.6.1
	import torch

	import trt_utils.common as common
	from lightglue_onnx import SuperPoint
	from lightglue_onnx.end2end import normalize_keypoints
	from lightglue_onnx.utils import load_image, rgb_to_grayscale


	def build_engine(
	model_path: str, output_path: str, num_keypoints: int = 512, desc_dim: int = 256
	):
	logger = trt.Logger(trt.Logger.WARNING)

	builder = trt.Builder(logger)

	network = builder.create_network(
	1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
	)

	parser = trt.OnnxParser(network, logger)

	success = parser.parse_from_file(model_path)
	for idx in range(parser.num_errors):
	print(parser.get_error(idx))

	if not success:
	raise Exception

	config = builder.create_builder_config()

	profile = builder.create_optimization_profile()

	for name in ["kpts0", "kpts1"]:
	profile.set_shape(
	name,
	(1, 32, 2),
	(1, num_keypoints // 2, 2),
	(1, num_keypoints, 2),
	)
	for name in ["desc0", "desc1"]:
	profile.set_shape(
	name,
	(1, 32, desc_dim),
	(1, num_keypoints // 2, desc_dim),
	(1, num_keypoints, desc_dim),
	)

	config.add_optimization_profile(profile)

	serialized_engine = builder.build_serialized_network(network, config)

	with open(output_path, "wb") as f:
	f.write(serialized_engine)


	def load_inputs(
	input_buffers,
	img_size=512,
	img0_path="assets/sacre_coeur1.jpg",
	img1_path="assets/sacre_coeur2.jpg",
	max_num_keypoints=512,
	):
	image0, scales0 = load_image(img0_path, resize=img_size)
	image1, scales1 = load_image(img1_path, resize=img_size)
	image0 = rgb_to_grayscale(image0)
	image1 = rgb_to_grayscale(image1)
	extractor = SuperPoint(max_num_keypoints=max_num_keypoints).eval()

	with torch.inference_mode():
	feats0, feats1 = extractor(image0[None]), extractor(image1[None])
	kpts0, scores0, desc0 = feats0
	kpts1, scores1, desc1 = feats1

	kpts0 = normalize_keypoints(kpts0, image0.shape[1], image0.shape[2])
	kpts1 = normalize_keypoints(kpts1, image1.shape[1], image1.shape[2])

	for i, tensor in zip(input_buffers, [kpts0, kpts1, desc0, desc1]):
	np.copyto(i.host, tensor.numpy().ravel())

	return {
	"kpts0": kpts0.shape,
	"kpts1": kpts1.shape,
	"desc0": desc0.shape,
	"desc1": desc1.shape,
	}


	def run_engine(engine_path: str):
	logger = trt.Logger(trt.Logger.WARNING)

	with open(engine_path, "rb") as f:
	engine = trt.Runtime(logger).deserialize_cuda_engine(f.read())

	# TODO: Dynamic output shapes
	inputs, outputs, bindings, stream = common.allocate_buffers(engine, profile_idx=0)

	shapes = load_inputs(inputs)
	context = engine.create_execution_context()

	for name, shape in shapes.items():
	context.set_input_shape(name, tuple(shape))

	trt_outputs = common.do_inference_v2(
	context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream
	)

	matches0, mscores0 = trt_outputs

	return matches0, mscores0


	if __name__ == "__main__":
	model_path = "weights/superpoint_lightglue.trt.onnx"
	output_path = "weights/superpoint_lightglue.engine"

	build_engine(model_path, output_path)
	matches0, mscores0 = run_engine(output_path)

	print(matches0.reshape(512, 2))