aurelio-labs/semantic-router

Unable to set OllamaLLM post url

Opened this issue · 0 comments

Hi, I wanted to use Ollama as my local LLM, but I'm hosting it in a different docker container than my app.

When I try to connect to Ollama from my app, I get the following expected error since they're running on different containers:
LLM error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/chat (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f622c1ead90>: Failed to establish a new connection: [Errno 111] Connection refused'))

Here is my code:

encoder = HuggingFaceEncoder()
_llm = OllamaLLM(llm_name="mistral")
rl = RouteLayer(encoder=encoder, routes=routes, llm=_llm)

Ideally, I would be able to instantiate OllamaLLM and set the base URL, something like the following (assuming my container is called "ollama"):

_llm = OllamaLLM(llm_name="mistral", base_url="http://ollama:11434")

However, OllamaLLM hardcodes the url, which makes sense given that it's meant to run locally.

...
response = requests.post("http://localhost:11434/api/chat", json=payload)
output = response.json()["message"]["content"]
...

(https://github.com/aurelio-labs/semantic-router/blob/main/semantic_router/llms/ollama.py#L52)


I think a simple fix to add a base_url arg would be the following:

diff --git a/semantic_router/llms/ollama.py b/semantic_router/llms/ollama.py
index df35ac0..3a09244 100644
--- a/semantic_router/llms/ollama.py
+++ b/semantic_router/llms/ollama.py
@@ -13,4 +13,5 @@ class OllamaLLM(BaseLLM):
     max_tokens: Optional[int]
     stream: Optional[bool]
+    base_url: Optional[str]

     def __init__(
@@ -21,4 +22,5 @@ class OllamaLLM(BaseLLM):
         max_tokens: Optional[int] = 200,
         stream: bool = False,
+        base_url: str = "http://localhost:11434",
     ):
         super().__init__(name=name)
@@ -27,4 +29,5 @@ class OllamaLLM(BaseLLM):
         self.max_tokens = max_tokens
         self.stream = stream
+        self.base_url = base_url

     def __call__(
@@ -35,4 +38,5 @@ class OllamaLLM(BaseLLM):
         max_tokens: Optional[int] = None,
         stream: Optional[bool] = None,
+        base_url: Optional[str] = None,
     ) -> str:
         # Use instance defaults if not overridden
@@ -41,4 +45,5 @@ class OllamaLLM(BaseLLM):
         max_tokens = max_tokens if max_tokens is not None else self.max_tokens
         stream = stream if stream is not None else self.stream
+        base_url = base_url if base_url is not None else self.base_url

         try:
@@ -50,5 +55,5 @@ class OllamaLLM(BaseLLM):
                 "stream": stream,
             }
-            response = requests.post("http://localhost:11434/api/chat", json=payload)
+            response = requests.post(f"{base_url}/api/chat", json=payload)
             output = response.json()["message"]["content"]

Here's a draft PR on my fork: https://github.com/prbarcelon/semantic-router/pull/1/files

What are the team's thoughts? Thank you!