Some sort of way to handle the model being used by 2 users at the same time.
Closed this issue · 5 comments
This is probably a limitation with ollama, but i opened 2 tabs and asked a question in both and it waited for the first one to finish before it started with the second one.
If thats an ollama limit, maybe some sort of message saying the model is in use by another user?
Perhaps a queue system like in gradio
Looks like its in progress
Ok update. I compiled ollama from the fork and not only does it work with concurrency but its so much faster now! Yay!
Ok update. I compiled ollama from the fork and not only does it work with concurrency but its so much faster now! Yay!
wait what the text generation is faster or just the model switching?
The text gen. I think it was because I had the llama-cuda arch package installed and it was glitchy so when I compiled ollama manually it fixed it.