[BUG]Ask aboout Qwen models with weight quantization .

Question

[BUG]Ask aboout Qwen models with weight quantization .

Opened this issue 5 months ago · 2 comments

Before you submit an issue, please search for existing issues to avoid duplicates.

Issue description:
Lightllm中的models里有qwen_wquant，我想了解这代码支持的是什么版本的qwen模型呢，本地下载了Qwen-7b-chat-AWQ和Qwen1.5-7b-chat-AWQ都会出错，错误内容是AttributeError: 'QwenTransformerLayerWeightQuantized' object has no attribute 'q_weight_'
Please provide a clear and concise description of your issue.

Steps to reproduce:

Please list the steps to reproduce the issue, such as:

command 0
command 2
command 3
See error

Expected behavior:

Please describe what you expected to happen.

Error logging:

If applicable, please copy and paste the error message or stack trace here. Use code blocks for better readability.

Environment:

Please provide information about your environment, such as:

Using container
OS: (Ubuntu 14.04, CentOS7)
GPU info:
- nvidia-smi (e.g. NVIDIA-SMI 525.116.04 Driver Version: 525.116.04 CUDA Version: 12.0)
- Graphics cards: (e.g. 4090x8)
Python: (e.g. CPython3.9)
- currently, only python>=3.9 is supported
LightLLm: (git commit-hash)
- for container: docker run --entrypoint cat --rm ghcr.io/modeltc/lightllm:main /lightllm/.git/refs/heads/main
openai-triton: pip show triton

Additional context:

Please add any other context or screenshots about the issue here.

Language:

Please use English as much as possible for better communication.

Answer 1 · 2024-05-15T08:43:22.000Z

@Cesilina 目前没有直接直接加载这种量化权重的加载方式，支持的加载方式是直接把fp16 加载量化成 int4 权重。

Answer 2 · 2024-05-15T08:44:10.000Z

而且 per channel per group 各种方案很多，可能需要自己去修改代码定制。