Please support Ollama or OpenWebui

Question

Please support Ollama or OpenWebui

Closed this issue 2 months ago · 22 comments

I am using Ollama with Open WebUI and would love to use that. Both provide OpenAI-compatible endpoints and should work.

I tried entering the Ollama URL HTTP://192.168.2.162:11434 or HTTP://192.168.2.162:11434/v1/chat/completions, but that did not work. I kept getting

With HTTP://192.168.2.162:11434, I get HTTP Error 400: Bad Request. With HTTP://192.168.2.162:11434/v1/chat/completions, I get HTTP Error 404: Not Found.

Answer 1 · 2024-07-31T01:44:42.000Z

Good idea. I think a better solution would be to implement an API call to the service being ran on your local network. Open WebUI is simply just a GUI that allows for you to easily access whatever LLMS you have running on the network you are connected to.

Answer 2 · 2024-07-31T01:45:44.000Z

what URL is the Ollama openai style API endpoint listening at?

Answer 3 · 2024-07-31T01:45:53.000Z

With http://192.168.2.162:11434/, I get HTTP Error 400: Bad Request. With http://192.168.2.162:11434/api/chat/completions, I get HTTP Error 404: Not Found.

Answer 4 · 2024-07-31T01:46:16.000Z

https://ollama.com/blog/openai-compatibility

Answer 5 · 2024-07-31T01:49:40.000Z

actually, both the URLs are hitting my Ollama instance. I see the following in the logs:

time=2024-07-30T21:42:01.258-04:00 level=INFO source=server.go:622 msg="llama runner started in 13.08 seconds"
[GIN] 2024/07/30 - 21:42:45 | 404 |     103.625µs |    192.168.2.30 | POST     "/v1/chat/completions/v1/completions"
[GIN] 2024/07/30 - 21:43:11 | 400 |     790.541µs |    192.168.2.30 | POST     "/v1/completions"
[GIN] 2024/07/30 - 21:45:31 | 404 |          55µs |    192.168.2.30 | POST     "/v1/v1/completions"
[GIN] 2024/07/30 - 21:46:56 | 404 |      43.333µs |    192.168.2.30 | GET      "/v1/v1/completions"
[GIN] 2024/07/30 - 21:47:09 | 404 |      40.666µs |    192.168.2.30 | GET      "/v1/completions"
[GIN] 2024/07/30 - 21:47:41 | 404 |      44.542µs |    192.168.2.30 | POST     "/v1/chat/completions/v1/completions"
[GIN] 2024/07/30 - 21:48:01 | 404 |      46.541µs |    192.168.2.30 | POST     "/v1/chat/completions/v1/completions"

Answer 6 · 2024-07-31T01:50:38.000Z

right so for some reason text generation webui and koboldcpp have the endpoint v1/completions which I was using, though they may also have v1/chat/completions. is Ollama not compatible with that?

Answer 7 · 2024-07-31T01:51:15.000Z

(that's currently hardcoded in localwriter, and appended after the port number; I confirmed it works with text-generation-webui and koboldcpp)

Answer 8 · 2024-07-31T01:55:03.000Z

Am I understanding right that this was only just added to ollama: ollama/ollama#5209

Answer 9 · 2024-07-31T01:59:43.000Z

Very curious to hear if it works with the most recent version of ollama; if not, I will test with ollama to figure out compatibility.

Answer 10 · 2024-07-31T02:28:27.000Z

I'm using the latest version of Ollama and I'm happy to try. Also, where are we specifying the model to be used? It would be great if you can test it out with Ollama.

Answer 11 · 2024-07-31T02:45:00.000Z

the model is chosen by the backend, localwriter is model-agnostic and just uses whatever model is provided at the endpoint.

I'll test for compatibility with Ollama.It's strange that it's not offering a v1/completions endpoint since that seems to have been merged.

Answer 12 · 2024-07-31T02:58:53.000Z

I also get error 400 for http://localhost:11434 which confirms something is wrong with ollama.

Answer 13 · 2024-07-31T03:23:46.000Z

So the problem is that ollama requires model to be specified in the request to the API where text-generation-webui, and koboldcpp don't require this so it isn't specified in the requests localwriter makes. I made a feature request on ollama ollama/ollama#6089 to support this behavior.

Answer 14 · 2024-07-31T10:40:32.000Z

@balisujohn I don't think that makes sense. Most apps using Ollama let users define the model to be used. We should expand the settings menu to include the default model option. I would absolutely like to specify the models (ideally both embedding and inference) being used to get better output. If I can run a 70b model, I should be able to choose that.

Answer 15 · 2024-07-31T16:14:13.000Z

I can add an optional model setting, but ollama should support requests without the model specified to match the other openai api implementations.

Answer 16 · 2024-07-31T16:17:43.000Z

Because the default model setting for localwriter will be no model specified, which will result in a 400 from ollama, whereas ideally it would just pick a model for you.

Answer 17 · 2024-07-31T16:28:50.000Z

Can we add a condition to using the model only if Ollama (port 11434) is being used? So, nothing changes for the other tools. I think even for other tools, we may want to provide an option to pick the model.

Answer 18 · 2024-07-31T17:05:02.000Z

I don't like that kind of implicit conditional logic because another backend could use that port. I will create a setting called model, and it will default to empty string. If it is not an empty string, the model key will be added to the post payload.

Answer 19 · 2024-07-31T17:08:23.000Z

that sounds great...will test it out as soon as you have it ready. Thanks!

Answer 20 · 2024-08-01T04:42:11.000Z

please see the newest release, it works for me with Ollama https://github.com/balisujohn/localwriter/releases/tag/v0.0.3

Answer 21 · 2024-08-01T11:27:32.000Z

Great....it works with Ollama now. We now need to improve the prompts to get better output.

Answer 22 · 2024-08-01T14:24:29.000Z

The prompts work pretty well for me with openchat3.5 with text generation webui, but I noticed ollama with llama3.1 always starts an assistant response and refuses to just extend the selection. Does ollama add extra tokens to the prompt? Also feel free to make a a issue or pr sharing prompts you find that work well for your backend/model combo.