xinference inference qwen-coder-instruct supports tools
Closed this issue · 11 comments
System Info / 系統信息
nvidia A40
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- docker / docker
- pip install / 通过 pip install 安装
- installation from source / 从源码安装
Version info / 版本信息
docker pull xprobe/xinference:v1.1.0
The command used to start Xinference / 用以启动 xinference 的命令
Engine : VLLM
Download hub:modelscope
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
【I can't set it at the vllm parameter,Because enable-auto-tool-choice's key does not have a value.】
# Additional parameters passed to the inference engine: vLLM
# --enable-auto-tool-choice
# --tool-call-parser hermes \
Reproduction / 复现过程
curl -X POST http://0.0.0.0:9997/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen2.5-coder-instruct", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "杭州天气怎么样" } ], "tools": [ { "type": "function", "function": { "name": "get_current_time", "description": "当你想知道现在的时间时非常有用。", "parameters": {} } }, { "type": "function", "function": { "name": "get_current_weather", "description": "当你想查询指定城市的天气时非常有用。", "parameters": { "type": "object", "properties": { "location":{ "type": "string", "description": "城市或县区,比如北京市、杭州市、余杭区等。" } }, "required": ["location"] } } } ] }'
{"detail":"Only ['qwen1.5-chat', 'qwen1.5-moe-chat', 'qwen2-instruct', 'qwen2-moe-instruct', 'qwen2.5-instruct', 'glm4-chat', 'glm4-chat-1m', 'llama-3.1-instruct'] support tool calls"}
Expected behavior / 期待表现
Support for tools capability
Additional parameters passed to the inference engine: vLLM
# --enable-auto-tool-choice
# --tool-call-parser hermes \
Xinf 内置的函数调用试过吗?
Xinf 内置的函数调用试过吗?
请问有示例么?
Xinf 内置的函数调用试过吗?
https://inference.readthedocs.io/zh-cn/latest/models/model_abilities/tools.html#
使用这个示例 也不行
{"detail":"Only ['qwen1.5-chat', 'qwen1.5-moe-chat', 'qwen2-instruct', 'qwen2-moe-instruct', 'qwen2.5-instruct', 'glm4-chat', 'glm4-chat-1m', 'llama-3.1-instruct'] support tool calls"}
@codingl2k1 得加一个 qwen2.5-coder-instruct 。
inference/xinference/model/llm/utils.py
Lines 49 to 54 in b0b2fa6
这个地方落了,你可以试下修改源码是否可以。如果可以可以提交 PR。
修改代码后 , tools可以使用,但是返回格式不正确, 测试了 Vllm启动,xinference启动的Ollama、Vllm启动。 都存在这个问题,应该是模型问题。
curl -X POST http://0.0.0.0:9997/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "qwen2.5-coder-instruct",
"messages": [
{
"role": "user",
"content": "杭州天气怎么样"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "当你想知道现在的时间时非常有用。",
"parameters": {}
}
},
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "当你想查询指定城市的天气时非常有用。",
"parameters": {
"type": "object",
"properties": {
"location":{
"type": "string",
"description": "城市或县区,比如北京市、杭州市、余杭区等。"
}
},
"required": ["location"]
}
}
}
]
}'
{"id":"chatcmpl-02f6c7f8-d76d-41fc-99c8-961b44e04a44","model":"qwen2.5-coder-instruct","object":"chat.completion","created":1734342553,"choices":[{"index":0,"message":{"role":"assistant","content":"{{"name": "get_current_weather", "arguments": {"location": "杭州"}}","tool_calls":[]},"finish_reason":"stop"}],"usage":{"prompt_tokens":449,"completion_tokens":19,"total_tokens":468}}
尝试更大一点的尺寸吧,小的处理 fc 有点弱。
@codingl2k1 得加一个 qwen2.5-coder-instruct 。
好的
测试了一下加上就ok了,不过我看qwen2vl的官方文档说支持function call,我也顺便加上了,但是测试好像因为官方chat template好像没有tool call就没调用出来
测试了一下加上就ok了,不过我看qwen2vl的官方文档说支持function call,我也顺便加上了,但是测试好像因为官方chat template好像没有tool call就没调用出来
可以先把 qwen2.5-coder-instruct 的提个 PR 出来,qwen2-vl 可能需要改 chat template 会更麻烦一点。