LLM chat and autocomplete can not work using Xinference api

Question

LLM chat and autocomplete can not work using Xinference api

Closed this issue 8 days ago · 0 comments

Before submitting your bug report

I believe this is a bug. I'll try to join the Continue Discord for questions
I'm not able to find an open issue that reports the same bug
I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS: ubuntu
- Continue version: 0.8.61
- IDE version: VScode 1.83.0
- Model: Qwen2.5-32B-Instruct, Qwen2.5-coder-32B-Instruct, Qwen2.5-coder-14B
- config.json:

Description

I ran a Xinference as model provider, and model provider is openai

Here are configurations, Qwen2-Instruct model is running well,
but Qwen2.5-32B-Instruct, Qwen2.5-Coder-32B-Instruct, Qwen2.5-Coder-14B can not work.
When chat with them, never print this response, but I'm sure that model had been answered.

Difference configuration is:
Xinference for Qwen2.5 model was upgrade to latest, but I checked API /v1/chat/compeletion is nothing changed.

  "models": [
    {
      "title": "Qwen2.5-Coder-Instruct",
      "model": "Qwen2.5-32B-Instruct",
      "systemMessage": "You are an expert software developer. You give helpful and concise responses.",
      "apiBase": "https://xinference-qwen25-32b-instruct/v1",
      "apiKey": "aaaaaa",
      "provider": "openai",
      "useLegacyCompletionsEndpoint": false
    },
    {
      "title": "Qwen2-Instruct",
      "model": "Qwen2-7B-Instruct",
      "systemMessage": "You are an expert software developer. You give helpful and concise responses.",
      "apiBase": "https://xinference-qwen2/v1",
      "apiKey": "aaaaaa",
      "provider": "openai",
      "useLegacyCompletionsEndpoint": false
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen2.5-coder",
    "provider": "openai",
    "model": "Qwen2.5-Coder-14B",
    "apiBase": "https://xinference-qwen25-coder-14b/v1",
    "apiKey": "aaaaaa",
    "systemMessage": "You are an expert software developer. You give helpful and concise responses."
  },

I can not get output from continue, what maybe the root case?

To reproduce

No response

Log output

==========================================================================
##### Completion options #####
{
  "contextLength": 8096,
  "model": "Qwen2.5-32B-Instruct",
  "maxTokens": 4096
}

##### Request options #####
{}

##### Prompt #####
<system>
You are an expert software developer. You give helpful and concise responses.

<user>
Who are you?

<assistant>
I am a sophisticated AI designed to assist with information, guidance, and problem-solving tasks related to software development and other topics. I'm here to help answer your questions, provide code examples, explain concepts, and much more. How can I assist you today?

<user>
Who are you?

==========================================================================
==========================================================================
##### Completion options #####
{
  "contextLength": 8096,
  "model": "Qwen2.5-32B-Instruct",
  "maxTokens": 4096
}

##### Request options #####
{}

##### Prompt #####
<system>
You are an expert software developer. You give helpful and concise responses.

<user>
Who are you?

==========================================================================