intel/neural-speed

Qwen2 GPTQ break in cpp_model.Model.np_bestla_qpack

Closed this issue · 1 comments

Hi,
The model I used is Qwen1.5-0.5B-Chat-GPTQ-int4 from huggingface.
After debugging, it seems the model cannot be converted correctly by:
cpp_model.Model.np_bestla_qpack(

it breaks here without error or message shown.
The program can still continue to run though. And eventually it will show error when trying generation for the first time:
error loading model: model.cpp: tensor 'model.layers.0.self_attn.q_proj.weight' is missing from model

@yuchen2580 Hi, thanks for your issue.

I noticed this problem for Qwen1.5-0.5B-Chat-GPTQ-Int4.

I think this GPTQ model has some problems probably. It should be from https://hf-mirror.com/Qwen/Qwen1.5-0.5B.

In Qwen1.5-0.5B-Chat-GPTQ-Int4, there is no lm_head.weight in the model.safetensor. But original Qwen1.5-0.5B, it has this weight.

if you use these commands to check them, you will see this weird problem.

from safetensors.torch import load_file
tensors = load_file("model.safetensors")
tensors.keys()

Qwen1.5-0.5B-Chat-GPTQ-Int4:
image

Qwen1.5-0.5B:
image

Qwen1.5-0.5B-Chat-GPTQ-Int4 also doesn't work on the HF, which means I suspect there is something wrong with this model.
image

Another way, you can try https://huggingface.co/Qwen/Qwen1.5-7B-Chat-GPTQ-Int4. it works.