Continuous Batching
Closed this issue · 1 comments
Meowmix42069 commented
Hello, I am the user of a llama.rn derivative app and I am wondering why continuous batching is not included in your implementation. As I understand it, continuous batching should be enabled by default for all server launches. What would be the easiest way to implement this necessary feature?
jhen0409 commented
Is this what you mean? https://github.com/ggerganov/llama.cpp/blob/2b1f616b208a4a21c4ee7a7eb85d822ff1d787af/examples/server/README.md?plain=1#L162-L167
If so, we have #30 already. Also, since we're need this internally too recently, I'm sure we'll be supporting it soon.