IndexError when running inference with Llama-2 model
shang-zhu opened this issue · 3 comments
shang-zhu commented
Hi thanks for this amazing work.
I followed the installation guide in this issue: #25. but it gives me the following error when running the inference code below on 2 V100 GPUs, each with 32GB:
python src/run_generation.py --model_type llama --model_name_or_path meta-llama/Llama-2-7b-chat-hf \
--prefix "<s>[INST] <<SYS>>\n You are a helpful assistant. Answer with detailed responses according to the entire instruction or question. \n<</SYS>>\n\n Summarize the following book: " \
--prompt example_inputs/harry_potter.txt \
--suffix " [/INST]" --test_unlimiformer --fp16 --length 200 --layer_begin 16 \
--index_devices 0 --datastore_device 0
Error:
File "/ocean/projects/cts180021p/shang9/foundation_models/openLLM4chem/unlimiformer/src/unlimiformer.py", line 1086, in preprocess_query
cos = cos[:,:,-1] # [1, 1, dim]
IndexError: too many indices for tensor of dimension 2
Do you know what may go wrong? Thanks.
urialon commented
Hi @shang-zhu ,
Thank you for your interest in our work!
What is your pytorch version and transformers version?
Best,
Uri
shang-zhu commented
Thank you for your quick reply!
Here is my pytorch and transformers version:
torch 2.1.0 pypi_0 pypi
transformers 4.36.0.dev0 pypi_0 pypi
shang-zhu commented
I actually made it work with the following software version:
pytorch 2.0.1 py3.11_cuda11.7_cudnn8.5.0_0 pytorch
transformers 4.31.0 pypi_0 pypi
Thanks for the help!