intel/neural-speed

Can't inference Llama2 through GGUF

ZJkyle opened this issue · 2 comments

ZJkyle commented

I've downloaded TheBloke/Llama-2-7B-Chat-GGUF from huggingface, and I use git lfs pull to download all ggufs.

Since my access to Meta/Llama 2 is not passed yet, I choose using KoboldAI/llama2-tokenizer as the tokenizer, and I'm running the code from readme, here is the error message I get:
Traceback (most recent call last): File "/neural-speed/llama.py", line 16, in <module> model = AutoModelForCausalLM.from_pretrained(model_name, model_file = model_file) File "/usr/local/lib/python3.10/dist-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py", line 165, in from_pretrained model.init_from_bin(model_type, gguf_model_file) File "/neural-speed/neural_speed/__init__.py", line 132, in init_from_bin self.__import_package(model_type) File "/neural-speed/neural_speed/__init__.py", line 46, in __import_package import neural_speed.llama_cpp as cpp_model ModuleNotFoundError: No module named 'neural_speed.llama_cpp'

Why is this happening?

ZJkyle commented

I have tried the scripts/python_api_example_for_gguf.py
but still got same result

DDEle commented

Are you running the script in the root dir of the project? If so, it will treat the nerual_speed directory as the neural_speed you were importing, which my may not have the C (CPP) extensions (i.e. neural_speed.llama_cpp) built. In this way, you can try:

  • run the script using the scripts directory as your working directory (cd scripts && python python_api_example_for_gguf.py)
  • or, make an editable build of Neural Speed
    pip uninstall neural_speed  # if you installed non-editable one previously
    pip install -r requirements
    pip install -ve .