huggingface/llm-ls

feat: add support for ollama

McPatate opened this issue · 3 comments

Asked by @gtnbssn in huggingface/llm.nvim#43.

  • add a flag to differentiate between different APIs
  • add parsing ollama response

docs: https://github.com/jmorganca/ollama/blob/main/docs/api.md

Hi @McPatate I’m the maintainer of LiteLLM - we allow you to create a proxy server to call 100+ LLMs, and I think it can solve your problem (I'd love your feedback if it does not)

Try it here: https://docs.litellm.ai/docs/proxy_server

Using LiteLLM Proxy Server

import openai
openai.api_base = "http://0.0.0.0:8000/" # proxy url
print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))

Creating a proxy server

Ollama models

$ litellm --model ollama/llama2 --api_base http://localhost:11434

Hugging Face Models

$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1

Anthropic

$ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1

Palm

$ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison

If it were a rust crate why not, but I'm not adding a proxy to the project. This adds a dependency on Python for users and I don't like the extra process.

I'm not all that familiar with Rust, but we when the request is generated in request_completion would it be reasonable to use a dynamic property name?

async fn request_completion(
    http_client: &reqwest::Client,
    ide: Ide,
    model: &str,
    request_params: RequestParams,
    api_token: Option<&String>,
    prompt: String,
    inputs_key: String,
    request_options: HashMap<String, String>,
) -> Result<Vec<Generation>> {
    let mut body = HashMap::new();
    body.extend(request_options);
    body.insert(inputs_key, prompt);
    body.insert("parameters", request_params.into());

    let res = http_client
        .post(build_url(model))
        .json(body)
        .headers(build_headers(api_token, ide)?)
        .send()
        .await
        .map_err(internal_error)?;

    // ...
}

Just an example, but we could add inputs_key and request_options to CompletionParams. To get this working for ollama a user could give inputs_key as "prompt" and request_options as { model: "ollama:7b-code" }.


Also as an aside, I don't get why we wouldn't be passing the params as a whole to the request_completion call? Would this be bad practise in Rust?

// Before
let result = request_completion(
    http_client,
    params.ide,
    params.model,
    params.request_params,
    params.api_token.as_ref(),
    prompt,
)
// After
let result = request_completion(
    http_client,
    params,
    prompt,
)