Fails to load saved model : Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect.
kranipa opened this issue · 8 comments
Loading saved model runs into following error
It also takes a very long time to run and save quantized models.
2024-03-21 08:48:58 [INFO] loading weights file models/4_bit_llama2-rtn/model.safetensors
2024-03-21 08:48:58 [ERROR] Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect.
2024-03-21 08:48:58 [ERROR] Saved low bit model loading failed, please check your model.
Tried following example.
import torch
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, RtnConfig, GPTQConfig, AwqConfig
model_path = "meta-llama/Llama-2-7b-chat-hf" # your_pytorch_model_path_or_HF_model_name
saved_dir = "models/4_bit_llama2-rtn" # your_saved_model_dir
#model_path = "Intel/neural-chat-7b-v3-3"
#saved_dir = "models/4_bit_neural_chat_7b-v3-3-rtn"
# quant
woq_config = RtnConfig(bits=4, compute_dtype="int8", scale_dtype='fp32', group_size=32)
model = AutoModelForCausalLM.from_pretrained(model_path,
device_map='cpu',
torch_dtype=torch.float16,
quantization_config=woq_config,
trust_remote_code=True,
use_neural_speed=False)
# save quant model
model.save_pretrained(saved_dir)
load quant model
loaded_model = AutoModelForCausalLM.from_pretrained(saved_dir,trust_remote_code = True)
intel-extension-for-transformers ==1.4rc2.dev8+g494a5712fa2
neural-compressor==2.4.1
neural-speed==0.4.dev21+g0ec1a6e
model = AutoModelForCausalLM.from_pretrained(model_path, device_map='cpu', torch_dtype=torch.float16, quantization_config=woq_config, trust_remote_code=True, _use_neural_speed=False_)
Do you want to use neural_speed? If yes, try to use neural speed = True.
Thank you for the response.
using use_neural_speed=True
save function doesnt work.
I get following error
AttributeError: 'Model' object has no attribute 'save_pretrained'
can you share an example how to save quantized model ( Model
object.) with neural_speed
It looks like load/save mismatch, can you try to use latest commit instead of g494a5712fa2 and set use_neural_speed=False?
Hi, Thank you. Saving works, however loading the saved model leads to following error
raise ValueError(
ValueError: Unknown quantization type, got rtn - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm']
following is the code snippet
import torch
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, RtnConfig, GPTQConfig, AwqConfig
model_path = "meta-llama/Llama-2-7b-chat-hf" # your_pytorch_model_path_or_HF_model_name
saved_dir = "models/4_bit_llama2-rtn" # your_saved_model_dir
#model_path = "Intel/neural-chat-7b-v3-3"
#saved_dir = "models/4_bit_neural_chat_7b-v3-3-rtn"
# quant
woq_config = RtnConfig(bits=4)
model = AutoModelForCausalLM.from_pretrained(model_path,
device_map='cpu',
#torch_dtype=torch.float16,
quantization_config=woq_config,
trust_remote_code=True,
use_neural_speed=False)
# save quant model
model.save_pretrained(saved_dir)
#load quant model
loaded_model = AutoModelForCausalLM.from_pretrained(saved_dir,trust_remote_code = True)
@kranipa , This issue is caused by mismatch the version of ITREX and neural-compressor. You can use neural-compressor version 2.5.1 and try it again. ITREX 1.4 is released now, Please try it. thanks very much.
okay , thank you.
@PhzCode , could you post your code and let me try to reproduce it. thanks very much.