langchain-ai/langchainjs

ollama functions should not have a default value for keepAlive

rick-github opened this issue · 3 comments

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import { ChatOllama } from "@langchain/community/chat_models/ollama";
import { HumanMessage } from "@langchain/core/messages";

async function main() {
  const duration = process.argv[2] ? parseInt(process.argv[2], 10) : undefined;
  
  const config: any = {
    baseUrl: "http://localhost:11434",
    model: "llama3.2",
  };
  
  if (duration !== undefined) {
    config.keepAlive = duration;
  }

  const chat = new ChatOllama(config);

  try {
    const response = await chat.invoke([
      new HumanMessage("2+2=?"),
    ]);
    console.log("Response:", response.content);
  } catch (error) {
    console.error("Error:", error);
  }
}

main();

Error Message and Stack Trace (if applicable)

No response

Description

Projects using ChatOllama have a default keepAlive added if the client has not set it. This overrides the value set by other clients or the server. If the ChatOllama client doesn't explicitly set keepAlive, the langchainjs library should not set it.

For example, by default my ollama server is set to never unload a model:

$ curl -s localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"2+2=?","stream":false}' | jq .response
"2 + 2 = 4"
$ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL   
llama3.2:latest    a80c4f17acd5    3.1 GB    100% GPU     Forever    

If I use the langchainjs library and explicitly set a timeout, it works as expected:

$ 2>&- node --loader ts-node/esm ollama.ts 1800
Response: 2 + 2 = 4
$ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL               
llama3.2:latest    a80c4f17acd5    3.1 GB    100% GPU     29 minutes from now    

If the langchainjs client doesn't set a keepAlive value, the expectation is the value previously set for the model should remain unchanged, but that's not the case:

$ 2>&- node --loader ts-node/esm ollama.ts 
Response: 2 + 2 = 4.
$ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL              
llama3.2:latest    a80c4f17acd5    3.1 GB    100% GPU     4 minutes from now    

System Info

└─ langchain@npm:0.3.6
   ├─ Instances: 1
   ├─ Version: 0.3.6
   │
   └─ Dependencies
      ├─ @langchain/openai@npm:>=0.1.0 <0.4.0 → npm:0.3.14
      ├─ @langchain/textsplitters@npm:>=0.0.0 <0.2.0 → npm:0.1.0
      ├─ js-tiktoken@npm:^1.0.12 → npm:1.0.15
      ├─ js-yaml@npm:^4.1.0 → npm:4.1.0
      ├─ jsonpointer@npm:^5.0.1 → npm:5.0.1
      ├─ langsmith@npm:^0.2.0 → npm:0.2.8
      ├─ openapi-types@npm:^12.1.3 → npm:12.1.3
      ├─ p-retry@npm:4 → npm:4.6.2
      ├─ uuid@npm:^10.0.0 → npm:10.0.0
      ├─ yaml@npm:^2.2.1 → npm:2.6.1
      ├─ zod-to-json-schema@npm:^3.22.3 → npm:3.23.5
      └─ zod@npm:^3.22.4 → npm:3.23.8

platform: linux

$ node --version
v20.9.0
$ yarn --version
3.5.1

Hey @rick-github! I'm here to help you with any bugs, questions, or if you'd like to become a contributor. Let's tackle this issue together!

I found a similar discussion regarding the keepAlive setting in LangChain.js. The issue was that the library sets a default keepAlive value for ChatOllama to "5m" when the client does not explicitly set it. This behavior can override the server's settings, which might not be desirable in all cases [1].

In the discussion, it was noted that upgrading to Node.js version 20 resolved a related issue with the UND_ERR_HEADERS_TIMEOUT error, which might be relevant if you're experiencing similar timeout issues [2].

If you want to avoid the default keepAlive setting, you might consider explicitly setting it to undefined or another value that aligns with your server's configuration. However, as of now, there doesn't seem to be a built-in option in LangChain.js to disable the default keepAlive behavior entirely.

To continue talking to Dosu, mention @dosu.

For comparison, the ollama python client library (maintained by the ollama project) performs as expected:

$ ollama run llama3.2 '2+2=?'
2 + 2 = 4
$ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL   
llama3.2:latest    a80c4f17acd5    3.1 GB    100% GPU     Forever    
$ python3 -c 'import ollama;print(ollama.Client().generate(model="llama3.2",prompt="2+2?",keep_alive=1800)["response"])'
The answer to 2 + 2 is 4.
$ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL               
llama3.2:latest    a80c4f17acd5    3.1 GB    100% GPU     29 minutes from now    
$ python3 -c 'import ollama;print(ollama.Client().generate(model="llama3.2",prompt="2+2?")["response"])'
2 + 2 = 4
$ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL               
llama3.2:latest    a80c4f17acd5    3.1 GB    100% GPU     29 minutes from now   
$ python3 -c 'import ollama;print(ollama.Client().chat(keep_alive=1800,model="llama3.2",messages=[{"role":"user","content":"2+2=?"}])["message"]["content"])'
2 + 2 = 4
$ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL               
llama3.2:latest    a80c4f17acd5    3.1 GB    100% GPU     29 minutes from now    
$ python3 -c 'import ollama;print(ollama.Client().chat(model="llama3.2",messages=[{"role":"user","content":"2+2=?"}])["message"]["content"])'
2 + 2 = 4.
$ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL               
llama3.2:latest    a80c4f17acd5    3.1 GB    100% GPU     29 minutes from now  

Gotcha, will have a look unless @bracesproul beats me to it!