[Dynamic Batching] Concerns about whether features are not supported using Medusa
Ageliss opened this issue · 0 comments
Ageliss commented
I checked the TRT-LLM but found something confusing. There are some features not supported:
- inferece batch size == 1, (seemed solved recently)
- not surport in-flight batching, which will be a great concern since this feature greatly improve thorouput
- temperature == 0, how about temperature > 0?
- kv_cache, I guess kv_cache needed recompute because generate_tokens will > 1
My biggest concern is whether Medusa2 conflicts with in-flight batching?