huggingface/llm-vscode

Inference api error: Service Unavailable

vanschroeder opened this issue · 9 comments

I am unable to make any use of llm-vscode. I get nothing but error messages, first I get Inference api error: Service Unavailable, and then with http error: builder error: relative URL without a base this seems like broken settings, but I have gone so far as to read the docs and try manually setting the llm.url setting to no avail, best I get is a "Service Unavailable" again and have just gone through a complete deletion of VSCode and it's settings files under the Application Support dir and a reboot + reinstall only to get this all over again.

llm-vscode was working at one point, but can no longer use it and Google search regarding these errors only points to some Rust issues in other projects, and the errors presented in VSCode are of no help at all. Anyway, posting this here and uninstalling the plugin to find other solutions. Would love to reevaluate if this issue ever has a resolution

Screenshot 2024-04-16 at 10 46 26 AM Screenshot 2024-04-16 at 11 01 22 AM

Sytem Data:

llm-vscode version: v0.2.0

VSCode Info:
Version: 1.88.1 (Universal)
Commit: e170252f762678dec6ca2cc69aba1570769a5d39
Date: 2024-04-10T17:42:52.765Z
Electron: 28.2.8
ElectronBuildId: 27744544
Chromium: 120.0.6099.291
Node.js: 18.18.2
V8: 12.0.267.19-electron.0
OS: Darwin arm64 23.4.0

MacOS version: 14.4 (23E214)
Processor: Mac M3 Pro (18Gb RAM)

same for me.

Screen Shot 2024-05-02 at 7 53 10 AM Same here too. Should we do a Pro account because of this or is my specs too old?

MacOS version: 10.14.6
VS Code version:1.85.2

bkoz commented

This fails on both Mac and Linux platforms. Could it be related to the LLM service itself? Perhaps something has changed in the configuration?

Service Unavailable either means the backend (in this case https://huggingface.co/docs/api-inference/en/index) is in a degraded state, either that it couldn't load your model.

Imo what happened is that starcoder is not loaded by default anymore given starcoder2 came out and can't be loaded because is too big. When this happens, you either need to change the backend, or change the model your extension is using. We (Hugging Face) could be more clear with what models are loaded manually in the Serverless API.
I've changed the default model configuration to use starcoder2, let me know if when updating the extension you're still facing issues.

FWIW, I also seem to be having this error on my Linux platform. Though, not sure if it's user error or related to this. Essentially the plugin using my inference endpoint (finetuned model of starcoder2) is throwing inference api error: Service Unavailable.

I can confirm my endpoint is operational via the huggingface console. I can also confirm I can hit my endpoint using curl. My settings.json is simply

{
    "llm.attributionEndpoint": "https://<endpoint-id>.us-east-1.aws.endpoints.huggingface.cloud"
}

Again, could be user error, wasn't sure if this helps or not, just trying to get something simple running for the first time. Any response or help is appreciated 😅

bkoz commented

I am able to get the extension working by setting the model ID to codellama/CodeLlama-13b-hf and the Config Template to hf/codellama/CodeLlama-13b-hf

I was able to get my extension working as well (user error). My settings.json is as follows, I had to update the modelId to comply with the base model my finetuned model was derived from:

{
    "[typescriptreact]": {
        "editor.defaultFormatter": "rvest.vs-code-prettier-eslint"
    },
    "llm.attributionEndpoint": "https://<endpoint-id>.us-east-1.aws.endpoints.huggingface.cloud",
    "cmake.showOptionsMovedNotification": false,
    "llm.modelId": "bigcode/starcoder2-15b",
    "llm.requestBody": {

        "parameters": {
            "max_new_tokens": 60,
            "temperature": 0.2,
            "top_p": 0.95
        }
    }
}

@JShep-tri attributionEndpoint is unrelated to the actual endpoint being called for completions. It is used to check if selected code is present in a dataset or not with the Llm: Show Code Attribution command.

In your case, you will need to do the following to configure llm-vscode to hit your endpoint:

{
    "llm.url": "https://<endpoint-id>.us-east-1.aws.endpoints.huggingface.cloud",
    "llm.backend": "tgi", // or whatever is powering inference, I assume it is TGI given you are using our dedicated inference solution
    "llm.modelId": "bigcode/starcoder2-15b",
    "llm.requestBody": {
        "parameters": {
            "max_new_tokens": 60,
            "temperature": 0.2,
            "top_p": 0.95
        }
    }
}

I'd suggest you take a look at llm.nvim's README as it's explained a bit better than llm-vscode's README (and it is quite similar).
The reason why you got it working is that by default the backend is huggingface and it will use the model id to route the request accordingly to the correct serverless endpoint.

Let me know if you need additional help, in the meantime I'll close the issue.

Thank you @McPatate, that worked!