epfml/landmark-attention

Assertion `srcIndex < srcSelectDimSize` failed while running test.

Closed this issue · 2 comments

L16H7 commented

I ran into this when running python run_test.py after recovering weights with this base model weights https://huggingface.co/huggyllama/llama-7b and released weights diff.

Number of tokens in this prompt:  2739
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [744,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [744,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [744,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [744,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [744,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [744,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [744,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.```

Any idea what I should be looking at. Thanks.
mkrima commented

Hi,

I just pushed a commit that fixes a bug for generating with models not trained with flash attention. The bug happened after adding the triton implementation. Can you see if this fixes the issue for you? I tested running run_test.py and so far it has been working. If the problem persists, please provide additional details such as:

1- Full stack trace and (if you can) the location of the instruction that is raising this error.
2- If you have made any changes to the code base
3- Is this happening for inference with the base model or the landmark model?

L16H7 commented

Thank you for your reply and fix. I figure out I didn't do the weights recovery well. I set the --path_tuned <your_path_tuned> incorrectly by pointing the base llama. Now, I used empty folder and it works now. Thanks a lot.