mybigday/llama.rn

Continuous Batching

Closed this issue · 1 comments

Hello, I am the user of a llama.rn derivative app and I am wondering why continuous batching is not included in your implementation. As I understand it, continuous batching should be enabled by default for all server launches. What would be the easiest way to implement this necessary feature?

Is this what you mean? https://github.com/ggerganov/llama.cpp/blob/2b1f616b208a4a21c4ee7a7eb85d822ff1d787af/examples/server/README.md?plain=1#L162-L167

If so, we have #30 already. Also, since we're need this internally too recently, I'm sure we'll be supporting it soon.