How to optimize the performance of Fastervit's int8

Question

How to optimize the performance of Fastervit's int8

tp-nan opened this issue a year ago · 1 comments

Fastervit has very powerful performance. Thank you for your work.

I found that for tensorrt，the int8 (best) and fp16 performances are very close, at 1.46077ms and 1.36375ms, respectively. Due to the fact that both the last two stages of the network are fused into a single Myelin layer, it is not possible to analyze the timing in detail.

If I want to improve the int8 performance of Fastervit, are there any feasible directions?

TensorRT Version： 8.6.1
machine：Tesla T4
onnx opset: 17

trtexec --onnx=./deployment/faster_vit_0_224_17.onnx --best
trtexec --onnx=./deployment/faster_vit_0_224_17.onnx --fp16

Answer 1 · 2023-08-04T01:51:48.000Z

closed as moved to NVIDIA/TensorRT#3186