How can I improve the reasoning speed of lightglue?

Hello, I use the code you provided to export superpoint.onnx and lightglue.onnx, and then use onnxruntime C++ for inference. The key point of extraction is set to 2000, and I want to add it into the SLAM system. The speed of superpoint inference can be met by tens of milliseconds, but lightglue needs more than 200 milliseconds. I turned on mp and flash when exporting, but the speed did not change, can you give me some suggestions to improve the speed of lightglue? Here is the code for the inference initialization environment:
` env_name_ = env_name;
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, env_name_.c_str());
Ort::SessionOptions sessionOptions;

// 设置cuda
OrtCUDAProviderOptions options;
options.device_id = 0;
sessionOptions.AppendExecutionProvider_CUDA(options);

// 启用混合精度
sessionOptions.EnableMemPattern();
sessionOptions.SetExecutionMode(ExecutionMode::ORT_SEQUENTIAL);

sessionOptions.SetIntraOpNumThreads(num_threads);
sessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
lg_session_ = std::shared_ptr<Ort::Session>(new Ort::Session(env, lg_onnx_path.c_str(), sessionOptions));`

Hi @zwl995, thank you for your interest in LightGlue-ONNX.

I think mixed-precision is faster only on certain hardware configurations.

One suggestion is that LightGlue allows you to trade accuracy for speed by reducing the number of layers. By default, the model is exported with the full 9 layers, but you can change it to only use, for example, 4 layers by changing this line:

LightGlue-ONNX/lightglue_onnx/lightglue.py

Line 313 in e82a1a4

for i in range(self.conf.n_layers):

to for i in range(4):.

PyTorch 2.1 will be released in a few more weeks, along with the new torch.export and torch.onnx.dynamo_export features. I'm not sure if these will offer any substantial speedups or support exporting control flow, but I'll revisit the topic of optimizing LightGlue further then.

Ok, thank you very much. I will try to reduce the number of layers and other suggestions, hopefully without too much loss of accuracy.

Hi @zwl995,

I did some further optimizations to LightGlue recently (Multi-head Attention fusion). You can try these new models out at https://github.com/fabio-sim/LightGlue-ONNX/releases/tag/v1.0.0 and see if there are any speed improvements