No reduction in inference time for qat model

Question

No reduction in inference time for qat model

Closed this issue 6 months ago · 2 comments

I have deployed yolov9-qat model using C++ tensort in RTX 3090, But I find the inference time is same comparing with fp16.
The modification I made on the fp16 code was simply to add a this:

config->setFlag(nvinfer1::BuilderFlag::kINT8);

i.e. I set both fp16 and int8:

config->setFlag(nvinfer1::BuilderFlag::kINT8);
config->setFlag(nvinfer1::BuilderFlag::kFP16);

And I tested infer-yolov9-c-qat-end2end.onnx, There is still no reduction in inference time for this model.

Answer 1 · 2024-04-23T03:20:44.000Z

Run this test and post results:
https://github.com/levipereira/yolov9-qat?tab=readme-ov-file#benchmark

Answer 2 · 2024-04-23T10:04:59.000Z

sorry, I made a mistake with the model.