FasterDecoding/Medusa

[Dynamic Batching] Concerns about whether features are not supported using Medusa

Ageliss opened this issue · 0 comments

I checked the TRT-LLM but found something confusing. There are some features not supported:

  1. inferece batch size == 1, (seemed solved recently)
  2. not surport in-flight batching, which will be a great concern since this feature greatly improve thorouput
  3. temperature == 0, how about temperature > 0?
  4. kv_cache, I guess kv_cache needed recompute because generate_tokens will > 1

My biggest concern is whether Medusa2 conflicts with in-flight batching?

image