Unable to set OllamaLLM post url
Opened this issue · 0 comments
Hi, I wanted to use Ollama as my local LLM, but I'm hosting it in a different docker container than my app.
When I try to connect to Ollama from my app, I get the following expected error since they're running on different containers:
LLM error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/chat (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f622c1ead90>: Failed to establish a new connection: [Errno 111] Connection refused'))
Here is my code:
encoder = HuggingFaceEncoder()
_llm = OllamaLLM(llm_name="mistral")
rl = RouteLayer(encoder=encoder, routes=routes, llm=_llm)
Ideally, I would be able to instantiate OllamaLLM
and set the base URL, something like the following (assuming my container is called "ollama"):
_llm = OllamaLLM(llm_name="mistral", base_url="http://ollama:11434")
However, OllamaLLM
hardcodes the url, which makes sense given that it's meant to run locally.
...
response = requests.post("http://localhost:11434/api/chat", json=payload)
output = response.json()["message"]["content"]
...
(https://github.com/aurelio-labs/semantic-router/blob/main/semantic_router/llms/ollama.py#L52)
I think a simple fix to add a base_url
arg would be the following:
diff --git a/semantic_router/llms/ollama.py b/semantic_router/llms/ollama.py
index df35ac0..3a09244 100644
--- a/semantic_router/llms/ollama.py
+++ b/semantic_router/llms/ollama.py
@@ -13,4 +13,5 @@ class OllamaLLM(BaseLLM):
max_tokens: Optional[int]
stream: Optional[bool]
+ base_url: Optional[str]
def __init__(
@@ -21,4 +22,5 @@ class OllamaLLM(BaseLLM):
max_tokens: Optional[int] = 200,
stream: bool = False,
+ base_url: str = "http://localhost:11434",
):
super().__init__(name=name)
@@ -27,4 +29,5 @@ class OllamaLLM(BaseLLM):
self.max_tokens = max_tokens
self.stream = stream
+ self.base_url = base_url
def __call__(
@@ -35,4 +38,5 @@ class OllamaLLM(BaseLLM):
max_tokens: Optional[int] = None,
stream: Optional[bool] = None,
+ base_url: Optional[str] = None,
) -> str:
# Use instance defaults if not overridden
@@ -41,4 +45,5 @@ class OllamaLLM(BaseLLM):
max_tokens = max_tokens if max_tokens is not None else self.max_tokens
stream = stream if stream is not None else self.stream
+ base_url = base_url if base_url is not None else self.base_url
try:
@@ -50,5 +55,5 @@ class OllamaLLM(BaseLLM):
"stream": stream,
}
- response = requests.post("http://localhost:11434/api/chat", json=payload)
+ response = requests.post(f"{base_url}/api/chat", json=payload)
output = response.json()["message"]["content"]
Here's a draft PR on my fork: https://github.com/prbarcelon/semantic-router/pull/1/files
What are the team's thoughts? Thank you!