abertsch72/unlimiformer

IndexError when running inference with Llama-2 model

shang-zhu opened this issue · 3 comments

Hi thanks for this amazing work.

I followed the installation guide in this issue: #25. but it gives me the following error when running the inference code below on 2 V100 GPUs, each with 32GB:

python src/run_generation.py --model_type llama --model_name_or_path meta-llama/Llama-2-7b-chat-hf \
    --prefix "<s>[INST] <<SYS>>\n You are a helpful assistant. Answer with detailed responses according to the entire instruction or question. \n<</SYS>>\n\n Summarize the following book: " \
    --prompt example_inputs/harry_potter.txt \
    --suffix " [/INST]" --test_unlimiformer --fp16 --length 200 --layer_begin 16 \
    --index_devices 0 --datastore_device 0

Error:

File "/ocean/projects/cts180021p/shang9/foundation_models/openLLM4chem/unlimiformer/src/unlimiformer.py", line 1086, in preprocess_query
    cos = cos[:,:,-1]  # [1, 1, dim]
IndexError: too many indices for tensor of dimension 2

Do you know what may go wrong? Thanks.

Hi @shang-zhu ,
Thank you for your interest in our work!

What is your pytorch version and transformers version?

Best,
Uri

Thank you for your quick reply!

Here is my pytorch and transformers version:

torch                     2.1.0                    pypi_0    pypi
transformers              4.36.0.dev0              pypi_0    pypi

I actually made it work with the following software version:

pytorch                   2.0.1           py3.11_cuda11.7_cudnn8.5.0_0    pytorch
transformers              4.31.0                   pypi_0    pypi

Thanks for the help!