[BUG]Ask aboout Qwen models with weight quantization .
Opened this issue · 2 comments
Before you submit an issue, please search for existing issues to avoid duplicates.
Issue description:
Lightllm中的models里有qwen_wquant,我想了解这代码支持的是什么版本的qwen模型呢,本地下载了Qwen-7b-chat-AWQ和Qwen1.5-7b-chat-AWQ都会出错,错误内容是AttributeError: 'QwenTransformerLayerWeightQuantized' object has no attribute 'q_weight_'
Please provide a clear and concise description of your issue.
Steps to reproduce:
Please list the steps to reproduce the issue, such as:
command 0
command 2
command 3
- See error
Expected behavior:
Please describe what you expected to happen.
Error logging:
If applicable, please copy and paste the error message or stack trace here. Use code blocks for better readability.
Environment:
Please provide information about your environment, such as:
-
Using container
-
OS: (Ubuntu 14.04, CentOS7)
-
GPU info:
nvidia-smi
(e.g.NVIDIA-SMI 525.116.04 Driver Version: 525.116.04 CUDA Version: 12.0
)- Graphics cards: (e.g. 4090x8)
-
Python: (e.g. CPython3.9)
- currently, only python>=3.9 is supported
-
LightLLm: (git commit-hash)
- for container:
docker run --entrypoint cat --rm ghcr.io/modeltc/lightllm:main /lightllm/.git/refs/heads/main
- for container:
-
openai-triton:
pip show triton
Additional context:
Please add any other context or screenshots about the issue here.
Language:
Please use English as much as possible for better communication.
@Cesilina 目前没有直接直接加载这种量化权重的加载方式,支持的加载方式是直接把fp16 加载量化成 int4 权重。
而且 per channel per group 各种方案很多,可能需要自己去修改代码定制。