Some sort of way to handle the model being used by 2 users at the same time.

Question

Some sort of way to handle the model being used by 2 users at the same time.

Closed this issue 8 months ago · 5 comments

This is probably a limitation with ollama, but i opened 2 tabs and asked a question in both and it waited for the first one to finish before it started with the second one.

If thats an ollama limit, maybe some sort of message saying the model is in use by another user?

Answer 1 · 2024-04-05T21:43:52.000Z

Perhaps a queue system like in gradio

Answer 2 · 2024-04-05T21:45:32.000Z

ollama/ollama#3418

Looks like its in progress

Answer 3 · 2024-04-05T22:00:18.000Z

Ok update. I compiled ollama from the fork and not only does it work with concurrency but its so much faster now! Yay!

Answer 4 · 2024-04-06T09:24:39.000Z

Ok update. I compiled ollama from the fork and not only does it work with concurrency but its so much faster now! Yay!

wait what the text generation is faster or just the model switching?

Answer 5 · 2024-04-06T09:45:58.000Z

The text gen. I think it was because I had the llama-cuda arch package installed and it was glitchy so when I compiled ollama manually it fixed it.