[Dynamic Batching] Concerns about whether features are not supported using Medusa

Question

Ageliss opened this issue 10 months ago · 0 comments

I checked the TRT-LLM but found something confusing. There are some features not supported:

inferece batch size == 1, (seemed solved recently)
not surport in-flight batching, which will be a great concern since this feature greatly improve thorouput
temperature == 0, how about temperature > 0?
kv_cache, I guess kv_cache needed recompute because generate_tokens will > 1

My biggest concern is whether Medusa2 conflicts with in-flight batching?