latency result slower than tensorrt fp16
zhanglei1172 opened this issue · 3 comments
Hi.
Our I-ViT TVM implementation is designed for the Turing Tensor Core (RTX 2080Ti), so there could be potential issues in the Ampere Tensor Core (RTX 3090) environment which could lead to sub-optimal optimizations.
We are also working on exploring solutions. And, if it's convenient, please let me know the version of TVM and TensorRT you're using.
Thanks for your reply, my development environment is
TVM: 0.14.dev0
tensorrt: 8.6.1
Hello, I just wanted to add that I saw the same issue on an A6000 with tensorrt10. @zhanglei1172 can I ask when you tried to use it tensorrt int8 whether you enabled fp16 as a fallback? And @zkkli, can I ask in your experiments if the times were measuered using TensorRT? Thanks !