使用最新的 vllm 镜像推理qwen2-72B-AWQ 报错

Question

使用最新的 vllm 镜像推理qwen2-72B-AWQ 报错

Tendo33 opened this issue 6 months ago · 4 comments

Tendo33 commented 6 months ago

提交前必须检查以下项目 | The following items must be checked before submission

请确保使用的是仓库最新代码（git pull），一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案 | I have searched the existing issues / discussions

问题类型 | Type of problem

模型推理和部署 | Model inference and deployment

操作系统 | Operating system

Linux

详细描述问题 | Detailed description of the problem

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/workspace/api/vllm_routes/chat.py", line 121, in create_chat_completion
    await get_guided_decoding_logits_processor(
TypeError: get_guided_decoding_logits_processor() missing 1 required positional argument: 'tokenizer'

Dependencies

最新的vllm镜像和最新的代码

运行日志或截图 | Runtime logs or screenshots

SETTINGS: {
    "trust_remote_code": true,
    "tokenize_mode": "auto",
    "tensor_parallel_size": 1,
    "gpu_memory_utilization": 0.95,
    "max_num_batched_tokens": -1,
    "max_num_seqs": 256,
    "quantization_method": null,
    "enforce_eager": false,
    "max_seq_len_to_capture": 8192,
    "max_loras": 1,
    "max_lora_rank": 32,
    "lora_extra_vocab_size": 256,
    "lora_dtype": "auto",
    "max_cpu_loras": -1,
    "lora_modules": "",
    "disable_custom_all_reduce": false,
    "vllm_disable_log_stats": true,
    "model_name": "Qwen2-72B-Instruct-AWQ",
    "model_path": "/workspace/share_data/base_llms/Qwen2-72B-Instruct-AWQ",
    "dtype": "auto",
    "load_in_8bit": false,
    "load_in_4bit": true,
    "context_length": 14000,
    "chat_template": "qwen2",
    "rope_scaling": null,
    "flash_attn": false,
    "interrupt_requests": true,
    "host": "0.0.0.0",
    "port": 8000,
    "api_prefix": "/v1",
    "engine": "vllm",
    "tasks": [
        "llm"
    ],
    "device_map": "auto",
    "gpus": "0",
    "num_gpus": 1,
    "activate_inference": true,
    "model_names": [
        "Qwen2-72B-Instruct-AWQ"
    ],
    "api_keys": null
}

Answer 1 · 2024-06-11T03:28:20.000Z

你使用的vllm版本是0.4.3吗

Answer 2 · 2024-06-11T03:31:28.000Z

你使用的vllm版本是0.4.3吗

是的，就是最新的 vllm docker file 制作的镜像

Answer 3 · 2024-06-11T03:35:14.000Z

这个错应该是0.4.2版本的才有，我刚刚改了一下，你更新一下代码试试？

Answer 4 · 2024-06-11T12:02:04.000Z

我发现一个小bug, 使用chatml 模板的时候，stop word 是 None 啊