juncongmoo/pyllama

Quick Question

ArgusK17 opened this issue · 0 comments

I am reading the codes and notice that in llama/generation.py line 77 we have:
i = tokens[:, prev_pos:cur_pos].
But from the second iteration, cur_pos=prev_pos+1, so i only include 1 token.

Is that correct? I thought that Transformer models need to take all previous tokens as their input. I am just a beginner for these models and I really wants to get more understanding.