Qwen-14B-Chat-Int4 加载报错
Jasonsey opened this issue · 1 comments
Jasonsey commented
提交前必须检查以下项目 | The following items must be checked before submission
- 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
- 我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案 | I have searched the existing issues / discussions
问题类型 | Type of problem
模型推理和部署 | Model inference and deployment
操作系统 | Operating system
Linux
详细描述问题 | Detailed description of the problem
启动Qwen-14B-Chat-Int4报错,报错内容:
Traceback (most recent call last):
File "/home/notebook/code/personal/IntentEngine/tmp/api-for-open-llm/server.py", line 2, in <module>
from api.models import EMBEDDED_MODEL, GENERATE_MDDEL, app, VLLM_ENGINE
File "/home/notebook/code/personal/IntentEngine/tmp/api-for-open-llm/api/models.py", line 135, in <module>
GENERATE_MDDEL = create_generate_model() if (not config.USE_VLLM and config.ACTIVATE_INFERENCE) else None
File "/home/notebook/code/personal/IntentEngine/tmp/api-for-open-llm/api/models.py", line 43, in create_generate_model
model, tokenizer = load_model(
File "/home/notebook/code/personal/IntentEngine/tmp/api-for-open-llm/api/apapter/model.py", line 235, in load_model
model, tokenizer = adapter.load_model(
File "/home/notebook/code/personal/IntentEngine/tmp/api-for-open-llm/api/apapter/model.py", line 107, in load_model
model = self.model_class.from_pretrained(
File "/opt/conda/envs/py310/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/envs/py310/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3257, in from_pretrained
model = quantizer.post_init_model(model)
File "/opt/conda/envs/py310/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object
初步分析,是使用了GPTQ的模块导致的,需要做兼容
Dependencies
# 请在此处粘贴依赖情况
# Please paste the dependencies here
运行日志或截图 | Runtime logs or screenshots
# 请在此处粘贴运行日志
# Please paste the run log here