ollama functions should not have a default value for keepAlive
rick-github opened this issue · 3 comments
Checked other resources
- I added a very descriptive title to this issue.
- I searched the LangChain.js documentation with the integrated search.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain.js rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
import { ChatOllama } from "@langchain/community/chat_models/ollama";
import { HumanMessage } from "@langchain/core/messages";
async function main() {
const duration = process.argv[2] ? parseInt(process.argv[2], 10) : undefined;
const config: any = {
baseUrl: "http://localhost:11434",
model: "llama3.2",
};
if (duration !== undefined) {
config.keepAlive = duration;
}
const chat = new ChatOllama(config);
try {
const response = await chat.invoke([
new HumanMessage("2+2=?"),
]);
console.log("Response:", response.content);
} catch (error) {
console.error("Error:", error);
}
}
main();
Error Message and Stack Trace (if applicable)
No response
Description
Projects using ChatOllama have a default keepAlive
added if the client has not set it. This overrides the value set by other clients or the server. If the ChatOllama client doesn't explicitly set keepAlive
, the langchainjs library should not set it.
For example, by default my ollama server is set to never unload a model:
$ curl -s localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"2+2=?","stream":false}' | jq .response
"2 + 2 = 4"
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.2:latest a80c4f17acd5 3.1 GB 100% GPU Forever
If I use the langchainjs library and explicitly set a timeout, it works as expected:
$ 2>&- node --loader ts-node/esm ollama.ts 1800
Response: 2 + 2 = 4
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.2:latest a80c4f17acd5 3.1 GB 100% GPU 29 minutes from now
If the langchainjs client doesn't set a keepAlive
value, the expectation is the value previously set for the model should remain unchanged, but that's not the case:
$ 2>&- node --loader ts-node/esm ollama.ts
Response: 2 + 2 = 4.
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.2:latest a80c4f17acd5 3.1 GB 100% GPU 4 minutes from now
System Info
└─ langchain@npm:0.3.6
├─ Instances: 1
├─ Version: 0.3.6
│
└─ Dependencies
├─ @langchain/openai@npm:>=0.1.0 <0.4.0 → npm:0.3.14
├─ @langchain/textsplitters@npm:>=0.0.0 <0.2.0 → npm:0.1.0
├─ js-tiktoken@npm:^1.0.12 → npm:1.0.15
├─ js-yaml@npm:^4.1.0 → npm:4.1.0
├─ jsonpointer@npm:^5.0.1 → npm:5.0.1
├─ langsmith@npm:^0.2.0 → npm:0.2.8
├─ openapi-types@npm:^12.1.3 → npm:12.1.3
├─ p-retry@npm:4 → npm:4.6.2
├─ uuid@npm:^10.0.0 → npm:10.0.0
├─ yaml@npm:^2.2.1 → npm:2.6.1
├─ zod-to-json-schema@npm:^3.22.3 → npm:3.23.5
└─ zod@npm:^3.22.4 → npm:3.23.8
platform: linux
$ node --version
v20.9.0
$ yarn --version
3.5.1
Hey @rick-github! I'm here to help you with any bugs, questions, or if you'd like to become a contributor. Let's tackle this issue together!
I found a similar discussion regarding the keepAlive
setting in LangChain.js. The issue was that the library sets a default keepAlive
value for ChatOllama to "5m" when the client does not explicitly set it. This behavior can override the server's settings, which might not be desirable in all cases [1].
In the discussion, it was noted that upgrading to Node.js version 20 resolved a related issue with the UND_ERR_HEADERS_TIMEOUT
error, which might be relevant if you're experiencing similar timeout issues [2].
If you want to avoid the default keepAlive
setting, you might consider explicitly setting it to undefined
or another value that aligns with your server's configuration. However, as of now, there doesn't seem to be a built-in option in LangChain.js to disable the default keepAlive
behavior entirely.
For comparison, the ollama python client library (maintained by the ollama project) performs as expected:
$ ollama run llama3.2 '2+2=?'
2 + 2 = 4
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.2:latest a80c4f17acd5 3.1 GB 100% GPU Forever
$ python3 -c 'import ollama;print(ollama.Client().generate(model="llama3.2",prompt="2+2?",keep_alive=1800)["response"])'
The answer to 2 + 2 is 4.
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.2:latest a80c4f17acd5 3.1 GB 100% GPU 29 minutes from now
$ python3 -c 'import ollama;print(ollama.Client().generate(model="llama3.2",prompt="2+2?")["response"])'
2 + 2 = 4
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.2:latest a80c4f17acd5 3.1 GB 100% GPU 29 minutes from now
$ python3 -c 'import ollama;print(ollama.Client().chat(keep_alive=1800,model="llama3.2",messages=[{"role":"user","content":"2+2=?"}])["message"]["content"])'
2 + 2 = 4
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.2:latest a80c4f17acd5 3.1 GB 100% GPU 29 minutes from now
$ python3 -c 'import ollama;print(ollama.Client().chat(model="llama3.2",messages=[{"role":"user","content":"2+2=?"}])["message"]["content"])'
2 + 2 = 4.
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.2:latest a80c4f17acd5 3.1 GB 100% GPU 29 minutes from now
Gotcha, will have a look unless @bracesproul beats me to it!