About inference speed compared to TRT16? [急急如律令]
xiao2mo opened this issue · 4 comments
Have u compared the ls fp16 with trt fp16?
It seems that trt fp16 is must faster than ls fp16 in VIT ? Is there something I've missing or being wrong with?
I’ve got some really disappointing results:
batch 128 image inference [vit-large]
--- huggingface fp32: 6203ms
--- ls fp32: 7408 ms [001 ? why slower than pure pytorch inference?]
--- trt fp16: 1924ms
--- ls fp16: 3701 ms [002 ? why much slower than TRT, where is the advantage of lightseq?]
GPU: T4
BTW: I compiled the lightseq whl in A100 with version 3.0.1 and use it by pip install on T4. Will it weaken the performance?
+1 update
time Update:
fp32:
huggingface : 6.214039353188127s [python backend]
lightseq: 4.908s
fp16:
huggingfae: 1.9158 s [python backend]
fp16: batch=8
lightseq time is: 3.6968295574188232s
fp16: batch=128
lightseq time: 3.481155266985297s
fp16: batch=64
lightseq time is: 3.532287359237671s
In conclusion, I have done all model result comparison between lightseq and pytorch implemetation by way of resolving a lot of implementation differences. It seems that the results is not fully compared as the author mentioned. And the result in long sequence large batch scenario is a bit of disappointing. Anyway, Thank for ur great work.
Call it an end.