No reduction in inference time for qat model
Closed this issue · 2 comments
demuxin commented
I have deployed yolov9-qat model using C++ tensort in RTX 3090, But I find the inference time is same comparing with fp16.
The modification I made on the fp16 code was simply to add a this:
config->setFlag(nvinfer1::BuilderFlag::kINT8);
i.e. I set both fp16 and int8:
config->setFlag(nvinfer1::BuilderFlag::kINT8);
config->setFlag(nvinfer1::BuilderFlag::kFP16);
And I tested infer-yolov9-c-qat-end2end.onnx, There is still no reduction in inference time for this model.
levipereira commented
Run this test and post results:
https://github.com/levipereira/yolov9-qat?tab=readme-ov-file#benchmark
demuxin commented
sorry, I made a mistake with the model.