bytedance/lightseq

About inference speed compared to TRT16? [急急如律令]

xiao2mo opened this issue · 4 comments

Have u compared the ls fp16 with trt fp16?
It seems that trt fp16 is must faster than ls fp16 in VIT ? Is there something I've missing or being wrong with?
I’ve got some really disappointing results:
batch 128 image inference [vit-large]

--- huggingface fp32: 6203ms
--- ls fp32: 7408 ms [001 ? why slower than pure pytorch inference?]
--- trt fp16: 1924ms
--- ls fp16: 3701 ms [002 ? why much slower than TRT, where is the advantage of lightseq?]

GPU: T4
BTW: I compiled the lightseq whl in A100 with version 3.0.1 and use it by pip install on T4. Will it weaken the performance?

+1 update

time Update:

fp32:
huggingface : 6.214039353188127s [python backend]
lightseq: 4.908s

fp16:
huggingfae: 1.9158 s [python backend]

fp16: batch=8
lightseq time is: 3.6968295574188232s

fp16: batch=128
lightseq time: 3.481155266985297s

fp16: batch=64
lightseq time is: 3.532287359237671s

In conclusion, I have done all model result comparison between lightseq and pytorch implemetation by way of resolving a lot of implementation differences. It seems that the results is not fully compared as the author mentioned. And the result in long sequence large batch scenario is a bit of disappointing. Anyway, Thank for ur great work.

Call it an end.