OpenAI backend still creates HuggingFace-formatted request
rggs opened this issue · 3 comments
rggs commented
I have llama.cpp
running locally. Here's the relevant part of my settings.json
:
"llm.configTemplate": "Custom",
"llm.fillInTheMiddle.enabled": true,
"llm.fillInTheMiddle.prefix": "<PRE> ",
"llm.fillInTheMiddle.middle": " <MID>",
"llm.fillInTheMiddle.suffix": " <SUF>",
"llm.contextWindow": 4096,
"llm.tokensToClear": [
"<EOT>"
],
"llm.tokenizer": {
"repository": "codellama/CodeLlama-13b-hf"
},
"llm.lsp.logLevel": "warn",
"llm.backend": "openai",
"llm.modelId": "CodeLlama70b",
"llm.url": "http://localhost:8080/v1/chat/completions",
However, looking at the request, it's still formatted as a HuggingFace request:
{"timestamp":1707926680,"level":"INFO","function":"log_server_request","line":2603,"message":"request","remote_addr":"127.0.0.1","remote_port":54364,"status":500,"method":"POST","path":"/v1/chat/completions","params":{}}
{"timestamp":1707926680,"level":"VERBOSE","function":"log_server_request","line":2608,"message":"request","request":"{\"model\":\"CodeLlama70b\",\"parameters\":{\"max_new_tokens\":60,\"temperature\":0.2,\"top_p\":0.95},\"prompt\":\"<PRE> import math\\n\\n# Here's a function that adds two numbers: <SUF> <MID>\",\"stream\":false}","response":"500 Internal Server Error\n[json.exception.type_error.302] type must be array, but is number"}
McPatate commented
It works with the /v1/completions
API, not sure it does with other endpoints
github-actions commented
This issue is stale because it has been open for 30 days with no activity.
McPatate commented
Closing for now, feel free to open an another issue if you're still having difficulties making it work.