intel/neural-speed

AssertionError: Fail to convert pytorch model

anthony-intel opened this issue · 3 comments

this is using the example code only

from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
model_name = "Intel/neural-chat-7b-v3-1"     # Hugging Face model_id or local model
prompt = "Once upon a time, there existed a little girl,"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids
streamer = TextStreamer(tokenizer)

model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
print(outputs)

yields

2024-03-27 02:12:43 [INFO] Using Neural Speed...
2024-03-27 02:12:43 [INFO] cpu device is used.
2024-03-27 02:12:43 [INFO] Applying Weight Only Quantization.
2024-03-27 02:12:43 [INFO] Using LLM runtime.
cmd: ['python', PosixPath('/usr/local/lib/python3.10/dist-packages/neural_speed/convert/convert_mistral.py'), '--outfile', 'runtime_outs/ne_mistral_f32.bin', '--outtype', 'f32', 'Intel/neural-chat-7b-v3-1']
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-17-40dcb74a8701>](https://localhost:8080/#) in <cell line: 10>()
      8 streamer = TextStreamer(tokenizer)
      9 
---> 10 model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
     11 outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
     12 print(outputs)

1 frames
[/usr/local/lib/python3.10/dist-packages/neural_speed/__init__.py](https://localhost:8080/#) in init(self, model_name, use_quant, use_gptq, use_awq, use_autoround, weight_dtype, alg, group_size, scale_dtype, compute_dtype, use_ggml)
    129         if not os.path.exists(fp32_bin):
    130             convert_model(model_name, fp32_bin, "f32")
--> 131             assert os.path.exists(fp32_bin), "Fail to convert pytorch model"
    132 
    133         if not use_quant:

AssertionError: Fail to convert pytorch model

image

Hi, this issue seems to have the same reason as #193. pip install neural_speed won't install all packages from requirements.txt and we are trying to fix it now. You can use pip install -r requirements.txt for a quick fix. Thanks.

@zhentaoyu thanks - looking forward to the fix

Hi, @anthony-intel, now the issue is fixed and please refer to https://github.com/intel/neural-speed?tab=readme-ov-file#installation. If you have no other questions, we will close this issue. Thanks.