xinference inference qwen-coder-instruct supports tools

System Info / 系統信息

nvidia A40

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

docker pull xprobe/xinference:v1.1.0

The command used to start Xinference / 用以启动 xinference 的命令

Engine ： VLLM
Download hub：modelscope

！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！
【I can't set it at the vllm parameter，Because enable-auto-tool-choice's key does not have a value.】
# Additional parameters passed to the inference engine: vLLM
# --enable-auto-tool-choice
# --tool-call-parser hermes \

Reproduction / 复现过程

curl -X POST http://0.0.0.0:9997/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen2.5-coder-instruct", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "杭州天气怎么样" } ], "tools": [ { "type": "function", "function": { "name": "get_current_time", "description": "当你想知道现在的时间时非常有用。", "parameters": {} } }, { "type": "function", "function": { "name": "get_current_weather", "description": "当你想查询指定城市的天气时非常有用。", "parameters": { "type": "object", "properties": { "location":{ "type": "string", "description": "城市或县区，比如北京市、杭州市、余杭区等。" } }, "required": ["location"] } } } ] }'

{"detail":"Only ['qwen1.5-chat', 'qwen1.5-moe-chat', 'qwen2-instruct', 'qwen2-moe-instruct', 'qwen2.5-instruct', 'glm4-chat', 'glm4-chat-1m', 'llama-3.1-instruct'] support tool calls"}

Expected behavior / 期待表现

Support for tools capability

Additional parameters passed to the inference engine: vLLM
# --enable-auto-tool-choice
# --tool-call-parser hermes \

Xinf 内置的函数调用试过吗？

Xinf 内置的函数调用试过吗？

请问有示例么？

Xinf 内置的函数调用试过吗？

https://inference.readthedocs.io/zh-cn/latest/models/model_abilities/tools.html#

使用这个示例也不行

{"detail":"Only ['qwen1.5-chat', 'qwen1.5-moe-chat', 'qwen2-instruct', 'qwen2-moe-instruct', 'qwen2.5-instruct', 'glm4-chat', 'glm4-chat-1m', 'llama-3.1-instruct'] support tool calls"}

@codingl2k1 得加一个 qwen2.5-coder-instruct 。

inference/xinference/model/llm/utils.py

Lines 49 to 54 in b0b2fa6

    
           QWEN_TOOL_CALL_FAMILY = [ 
        
               "qwen1.5-chat", 
        
               "qwen1.5-moe-chat", 
        
               "qwen2-instruct", 
        
               "qwen2-moe-instruct", 
        
               "qwen2.5-instruct",

这个地方落了，你可以试下修改源码是否可以。如果可以可以提交 PR。

修改代码后， tools可以使用，但是返回格式不正确，测试了 Vllm启动，xinference启动的Ollama、Vllm启动。都存在这个问题，应该是模型问题。

curl -X POST http://0.0.0.0:9997/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "qwen2.5-coder-instruct",
"messages": [
{
"role": "user",
"content": "杭州天气怎么样"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "当你想知道现在的时间时非常有用。",
"parameters": {}
}
},
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "当你想查询指定城市的天气时非常有用。",
"parameters": {
"type": "object",
"properties": {
"location":{
"type": "string",
"description": "城市或县区，比如北京市、杭州市、余杭区等。"
}
},
"required": ["location"]
}
}
}
]
}'
{"id":"chatcmpl-02f6c7f8-d76d-41fc-99c8-961b44e04a44","model":"qwen2.5-coder-instruct","object":"chat.completion","created":1734342553,"choices":[{"index":0,"message":{"role":"assistant","content":"{{"name": "get_current_weather", "arguments": {"location": "杭州"}}","tool_calls":[]},"finish_reason":"stop"}],"usage":{"prompt_tokens":449,"completion_tokens":19,"total_tokens":468}}

尝试更大一点的尺寸吧，小的处理 fc 有点弱。

@codingl2k1 得加一个 qwen2.5-coder-instruct 。

好的

测试了一下加上就ok了，不过我看qwen2vl的官方文档说支持function call，我也顺便加上了，但是测试好像因为官方chat template好像没有tool call就没调用出来

测试了一下加上就ok了，不过我看qwen2vl的官方文档说支持function call，我也顺便加上了，但是测试好像因为官方chat template好像没有tool call就没调用出来

可以先把 qwen2.5-coder-instruct 的提个 PR 出来，qwen2-vl 可能需要改 chat template 会更麻烦一点。

	QWEN_TOOL_CALL_FAMILY = [
	"qwen1.5-chat",
	"qwen1.5-moe-chat",
	"qwen2-instruct",
	"qwen2-moe-instruct",
	"qwen2.5-instruct",