likejazz/llama3.cuda

Output same word

Opened this issue · 2 comments

Hi, I download code and compile it, but got unexpected results.
environments:
OS: centos 7.5
GPU: V100, 32G
gcc: gcc-8

make -j 20
./runcuda "I have a dream"

The model outputs: "I have a dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream"

It's some weird.

Unfortunately I don't have V100 so I can't test it, could you please print the whole logit value for debugging?

Reference in n

yes, I print the logits for the first five value, they are [0.019237, 0.026642, 0.019238, 0.019239, 0.019238], in line 821.
when generate function goes on, the logits keep all the same. It seems that the new token has not participate in generation process.
For each iteration, the largest prob value is always 0.176558, so the decoded token keeps "dream".
my cuda version is 12.2, gcc is 8.5.

I test the code in A100 GPU, CUDA 12.2, gcc 9.3, the result is the same.

Could you please show your compiler and gpu information, I try my best reprocess your results, thanks.