huggingface/llm-vscode

OpenAI backend still creates HuggingFace-formatted request

rggs opened this issue · 3 comments

rggs commented

I have llama.cpp running locally. Here's the relevant part of my settings.json:

    "llm.configTemplate": "Custom",
    "llm.fillInTheMiddle.enabled": true,
    "llm.fillInTheMiddle.prefix": "<PRE> ",
    "llm.fillInTheMiddle.middle": " <MID>",
    "llm.fillInTheMiddle.suffix": " <SUF>",
    "llm.contextWindow": 4096,
    "llm.tokensToClear": [
        "<EOT>"
    ],
    "llm.tokenizer": {
        "repository": "codellama/CodeLlama-13b-hf"
    },
    "llm.lsp.logLevel": "warn",
    "llm.backend": "openai",
    "llm.modelId": "CodeLlama70b",
    "llm.url": "http://localhost:8080/v1/chat/completions",

However, looking at the request, it's still formatted as a HuggingFace request:

{"timestamp":1707926680,"level":"INFO","function":"log_server_request","line":2603,"message":"request","remote_addr":"127.0.0.1","remote_port":54364,"status":500,"method":"POST","path":"/v1/chat/completions","params":{}}
{"timestamp":1707926680,"level":"VERBOSE","function":"log_server_request","line":2608,"message":"request","request":"{\"model\":\"CodeLlama70b\",\"parameters\":{\"max_new_tokens\":60,\"temperature\":0.2,\"top_p\":0.95},\"prompt\":\"<PRE> import math\\n\\n# Here's a function that adds two numbers: <SUF> <MID>\",\"stream\":false}","response":"500 Internal Server Error\n[json.exception.type_error.302] type must be array, but is number"}

It works with the /v1/completions API, not sure it does with other endpoints

This issue is stale because it has been open for 30 days with no activity.

Closing for now, feel free to open an another issue if you're still having difficulties making it work.