zkkli/I-ViT

latency result slower than tensorrt fp16

zhanglei1172 opened this issue · 3 comments

Hi, I tried to replicate your speed experiment, I tested the deit_tiny, batch size=1, RTX3090 environment, after a few days of autotune, compared to tensorrt FP16, speed is still slower.

Here are the results of my experiment:

image
zkkli commented

Hi.

Our I-ViT TVM implementation is designed for the Turing Tensor Core (RTX 2080Ti), so there could be potential issues in the Ampere Tensor Core (RTX 3090) environment which could lead to sub-optimal optimizations.

We are also working on exploring solutions. And, if it's convenient, please let me know the version of TVM and TensorRT you're using.

Thanks for your reply, my development environment is

TVM: 0.14.dev0
tensorrt: 8.6.1

Hello, I just wanted to add that I saw the same issue on an A6000 with tensorrt10. @zhanglei1172 can I ask when you tried to use it tensorrt int8 whether you enabled fp16 as a fallback? And @zkkli, can I ask in your experiments if the times were measuered using TensorRT? Thanks !