levipereira/yolov9-qat

No reduction in inference time for qat model

Closed this issue · 2 comments

I have deployed yolov9-qat model using C++ tensort in RTX 3090, But I find the inference time is same comparing with fp16.
The modification I made on the fp16 code was simply to add a this:

config->setFlag(nvinfer1::BuilderFlag::kINT8);

i.e. I set both fp16 and int8:

config->setFlag(nvinfer1::BuilderFlag::kINT8);
config->setFlag(nvinfer1::BuilderFlag::kFP16);

And I tested infer-yolov9-c-qat-end2end.onnx, There is still no reduction in inference time for this model.

sorry, I made a mistake with the model.