Llama3-70B inference stopping issue

Question

Llama3-70B inference stopping issue

sjaelee25 opened this issue 4 months ago · 0 comments

First of all, thank you for your well-constructed code!

I have tried running rap_gam8k with the newly published Llama3 and successfully conducted the experiment with Llama3-8B. However, when using Llama3-70B, the experiment stops during the inference stage (max_seq_len = 2048). (It pauses without any specific error message.)

Therefore, I attempted to reduce the max_seq_len value or decrease the input prompt composed of 4-shot to one-shot or zero-shot. In this case, during the loop generating the next token, the initial dozens or hundreds of inferences proceed without any issues, but then it pauses without any error message when either mid-loop or upon receiving the next prompt.

If you've experienced a similar issue or have resolved it before, any assistance would be greatly appreciated! Thanks!

My command is as follows: torchrun --nproc-per-node 8 --master-port 6666 examples/rap_gsm8k/inference.py --base_lm llama-3 --llama_3_ckpts /home/llama3/ --llama_size 70B