certik opened this issue 2 years ago · 0 comments
Batching means that multiple input streams are being computed at the same time, which can vectorize better and thus speedup the inference per token.