RuntimeError: CUDA error: device-side assert triggered when using Llama 2 from HF

Question

RuntimeError: CUDA error: device-side assert triggered when using Llama 2 from HF

andreasbinder opened this issue a year ago · 3 comments

Good Day!
I tried to run the GSM8k example with the model from HF as you described: (only adjust the log and prompt paths)

  CUDA_VISIBLE_DEVICES=0,1 python examples/rap_gsm8k/inference.py --base_lm hf --hf_path meta-llama/Llama-2-70b-hf --hf_peft_path None --hf_quantized 'nf4'

However, I receive the following error

    RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I think this is related to the warning also mentioned in the log trace:

    llm-reasoners/reasoners/lm/hf_model.py:137: UserWarning: the eos_token '\n' is encoded into [29871, 13] with length != 1, using 13 as the eos_token_id
  warnings.warn(f'the eos_token {repr(token)} is encoded into {tokenized} with length != 1, '

When searching on GitHub, I think it is related to input mismatching due to some false tokenisation 1 2 3.
Did you also encounter this problem or how did you go about it?
I will try the other versions of Llama in the meantime.

I am using transformers 4.33.1

Thx!

Answer 1 · 2023-09-18T16:36:08.000Z

Hi, for the CUDA error, could you try following the message? For debugging consider passing CUDA_LAUNCH_BLOCKING=1

The warning you showed shouldn't matter. It's expected in this example. We want the generation to stop at \n, and 13 is the token index of \n. For some reason, it's encoded into 2 tokens ([29871, 13]), so we just use 13 as the eos_token.

Answer 2 · 2023-10-06T15:57:11.000Z

please send us with more detailed information of error since the RuntimeError and warning cannot provide enough information. we are delighted to help you with our work:p

Answer 3 · 2023-10-20T08:58:34.000Z

Hi! I am sorry for the late reply :(
I worked with TheBloke/Llama-2-13B-GPTQ for most experiments so far. I now tried Llama 2 again, and I did not run into a problem this time ^^
In case I find the encounter the error and the corresponding solution too, I will let you know!