OpenAI backend still creates HuggingFace-formatted request

Question

OpenAI backend still creates HuggingFace-formatted request

rggs opened this issue 8 months ago · 3 comments

I have llama.cpp running locally. Here's the relevant part of my settings.json:

    "llm.configTemplate": "Custom",
    "llm.fillInTheMiddle.enabled": true,
    "llm.fillInTheMiddle.prefix": "<PRE> ",
    "llm.fillInTheMiddle.middle": " <MID>",
    "llm.fillInTheMiddle.suffix": " <SUF>",
    "llm.contextWindow": 4096,
    "llm.tokensToClear": [
        "<EOT>"
    ],
    "llm.tokenizer": {
        "repository": "codellama/CodeLlama-13b-hf"
    },
    "llm.lsp.logLevel": "warn",
    "llm.backend": "openai",
    "llm.modelId": "CodeLlama70b",
    "llm.url": "http://localhost:8080/v1/chat/completions",

However, looking at the request, it's still formatted as a HuggingFace request:

{"timestamp":1707926680,"level":"INFO","function":"log_server_request","line":2603,"message":"request","remote_addr":"127.0.0.1","remote_port":54364,"status":500,"method":"POST","path":"/v1/chat/completions","params":{}}
{"timestamp":1707926680,"level":"VERBOSE","function":"log_server_request","line":2608,"message":"request","request":"{\"model\":\"CodeLlama70b\",\"parameters\":{\"max_new_tokens\":60,\"temperature\":0.2,\"top_p\":0.95},\"prompt\":\"<PRE> import math\\n\\n# Here's a function that adds two numbers: <SUF> <MID>\",\"stream\":false}","response":"500 Internal Server Error\n[json.exception.type_error.302] type must be array, but is number"}

Answer 1 · 2024-02-16T16:23:42.000Z

It works with the /v1/completions API, not sure it does with other endpoints

Answer 2 · 2024-03-18T01:45:07.000Z

This issue is stale because it has been open for 30 days with no activity.

Answer 3 · 2024-04-23T07:13:41.000Z

Closing for now, feel free to open an another issue if you're still having difficulties making it work.