Can't inference Llama2 through GGUF
ZJkyle opened this issue · 2 comments
I've downloaded TheBloke/Llama-2-7B-Chat-GGUF from huggingface, and I use git lfs pull to download all ggufs.
Since my access to Meta/Llama 2 is not passed yet, I choose using KoboldAI/llama2-tokenizer as the tokenizer, and I'm running the code from readme, here is the error message I get:
Traceback (most recent call last): File "/neural-speed/llama.py", line 16, in <module> model = AutoModelForCausalLM.from_pretrained(model_name, model_file = model_file) File "/usr/local/lib/python3.10/dist-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py", line 165, in from_pretrained model.init_from_bin(model_type, gguf_model_file) File "/neural-speed/neural_speed/__init__.py", line 132, in init_from_bin self.__import_package(model_type) File "/neural-speed/neural_speed/__init__.py", line 46, in __import_package import neural_speed.llama_cpp as cpp_model ModuleNotFoundError: No module named 'neural_speed.llama_cpp'
Why is this happening?
I have tried the scripts/python_api_example_for_gguf.py
but still got same result
Are you running the script in the root dir of the project? If so, it will treat the nerual_speed directory as the neural_speed you were importing, which my may not have the C (CPP) extensions (i.e. neural_speed.llama_cpp) built. In this way, you can try:
- run the script using the scripts directory as your working directory (
cd scripts && python python_api_example_for_gguf.py
) - or, make an editable build of Neural Speed
pip uninstall neural_speed # if you installed non-editable one previously pip install -r requirements pip install -ve .