balisujohn/localwriter

Please support Ollama or OpenWebui

Closed this issue · 22 comments

I am using Ollama with Open WebUI and would love to use that. Both provide OpenAI-compatible endpoints and should work.

I tried entering the Ollama URL HTTP://192.168.2.162:11434 or HTTP://192.168.2.162:11434/v1/chat/completions, but that did not work. I kept getting

With HTTP://192.168.2.162:11434, I get HTTP Error 400: Bad Request. With HTTP://192.168.2.162:11434/v1/chat/completions, I get HTTP Error 404: Not Found.

Good idea. I think a better solution would be to implement an API call to the service being ran on your local network. Open WebUI is simply just a GUI that allows for you to easily access whatever LLMS you have running on the network you are connected to.

what URL is the Ollama openai style API endpoint listening at?

With http://192.168.2.162:11434/, I get HTTP Error 400: Bad Request. With http://192.168.2.162:11434/api/chat/completions, I get HTTP Error 404: Not Found.

actually, both the URLs are hitting my Ollama instance. I see the following in the logs:

time=2024-07-30T21:42:01.258-04:00 level=INFO source=server.go:622 msg="llama runner started in 13.08 seconds"
[GIN] 2024/07/30 - 21:42:45 | 404 |     103.625µs |    192.168.2.30 | POST     "/v1/chat/completions/v1/completions"
[GIN] 2024/07/30 - 21:43:11 | 400 |     790.541µs |    192.168.2.30 | POST     "/v1/completions"
[GIN] 2024/07/30 - 21:45:31 | 404 |          55µs |    192.168.2.30 | POST     "/v1/v1/completions"
[GIN] 2024/07/30 - 21:46:56 | 404 |      43.333µs |    192.168.2.30 | GET      "/v1/v1/completions"
[GIN] 2024/07/30 - 21:47:09 | 404 |      40.666µs |    192.168.2.30 | GET      "/v1/completions"
[GIN] 2024/07/30 - 21:47:41 | 404 |      44.542µs |    192.168.2.30 | POST     "/v1/chat/completions/v1/completions"
[GIN] 2024/07/30 - 21:48:01 | 404 |      46.541µs |    192.168.2.30 | POST     "/v1/chat/completions/v1/completions"

right so for some reason text generation webui and koboldcpp have the endpoint v1/completions which I was using, though they may also have v1/chat/completions. is Ollama not compatible with that?

(that's currently hardcoded in localwriter, and appended after the port number; I confirmed it works with text-generation-webui and koboldcpp)

Am I understanding right that this was only just added to ollama: ollama/ollama#5209

Very curious to hear if it works with the most recent version of ollama; if not, I will test with ollama to figure out compatibility.

I'm using the latest version of Ollama and I'm happy to try. Also, where are we specifying the model to be used? It would be great if you can test it out with Ollama.

the model is chosen by the backend, localwriter is model-agnostic and just uses whatever model is provided at the endpoint.

I'll test for compatibility with Ollama.It's strange that it's not offering a v1/completions endpoint since that seems to have been merged.

I also get error 400 for http://localhost:11434 which confirms something is wrong with ollama.

So the problem is that ollama requires model to be specified in the request to the API where text-generation-webui, and koboldcpp don't require this so it isn't specified in the requests localwriter makes. I made a feature request on ollama ollama/ollama#6089 to support this behavior.

@balisujohn I don't think that makes sense. Most apps using Ollama let users define the model to be used. We should expand the settings menu to include the default model option. I would absolutely like to specify the models (ideally both embedding and inference) being used to get better output. If I can run a 70b model, I should be able to choose that.

I can add an optional model setting, but ollama should support requests without the model specified to match the other openai api implementations.

Because the default model setting for localwriter will be no model specified, which will result in a 400 from ollama, whereas ideally it would just pick a model for you.

Can we add a condition to using the model only if Ollama (port 11434) is being used? So, nothing changes for the other tools. I think even for other tools, we may want to provide an option to pick the model.

I don't like that kind of implicit conditional logic because another backend could use that port. I will create a setting called model, and it will default to empty string. If it is not an empty string, the model key will be added to the post payload.

that sounds great...will test it out as soon as you have it ready. Thanks!

please see the newest release, it works for me with Ollama https://github.com/balisujohn/localwriter/releases/tag/v0.0.3

Great....it works with Ollama now. We now need to improve the prompts to get better output.

The prompts work pretty well for me with openchat3.5 with text generation webui, but I noticed ollama with llama3.1 always starts an assistant response and refuses to just extend the selection. Does ollama add extra tokens to the prompt? Also feel free to make a a issue or pr sharing prompts you find that work well for your backend/model combo.