Can't inference Llama2 through GGUF

Question

Can't inference Llama2 through GGUF

ZJkyle opened this issue a year ago · 2 comments

I've downloaded TheBloke/Llama-2-7B-Chat-GGUF from huggingface, and I use git lfs pull to download all ggufs.

Since my access to Meta/Llama 2 is not passed yet, I choose using KoboldAI/llama2-tokenizer as the tokenizer, and I'm running the code from readme, here is the error message I get:
Traceback (most recent call last): File "/neural-speed/llama.py", line 16, in <module> model = AutoModelForCausalLM.from_pretrained(model_name, model_file = model_file) File "/usr/local/lib/python3.10/dist-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py", line 165, in from_pretrained model.init_from_bin(model_type, gguf_model_file) File "/neural-speed/neural_speed/__init__.py", line 132, in init_from_bin self.__import_package(model_type) File "/neural-speed/neural_speed/__init__.py", line 46, in __import_package import neural_speed.llama_cpp as cpp_model ModuleNotFoundError: No module named 'neural_speed.llama_cpp'

Why is this happening?

Answer 1 · 2024-01-23T16:18:47.000Z

I have tried the scripts/python_api_example_for_gguf.py
but still got same result

Answer 2 · 2024-01-24T07:52:19.000Z

Are you running the script in the root dir of the project? If so, it will treat the nerual_speed directory as the neural_speed you were importing, which my may not have the C (CPP) extensions (i.e. neural_speed.llama_cpp) built. In this way, you can try:

run the script using the scripts directory as your working directory (cd scripts && python python_api_example_for_gguf.py)

or, make an editable build of Neural Speed

pip uninstall neural_speed  # if you installed non-editable one previously
pip install -r requirements
pip install -ve .