使用vllm启动server.py时报错

Question

使用vllm启动server.py时报错

whm233 opened this issue 8 months ago · 1 comments

whm233 commented 8 months ago

提交前必须检查以下项目 | The following items must be checked before submission

请确保使用的是仓库最新代码（git pull），一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案 | I have searched the existing issues / discussions

问题类型 | Type of problem

None

操作系统 | Operating system

None

详细描述问题 | Detailed description of the problem

The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (4512). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.

Dependencies

vllm 0.2.7
torch 2.1.0

运行日志或截图 | Runtime logs or screenshots

# 请在此处粘贴运行日志
# Please paste the run log here

Answer 1 · 2024-02-04T02:07:41.000Z

试试设置上下文长度，环境变量CONTEXT_LEN=4096或者更小的值