Potential mistake
Opened this issue · 0 comments
Kalyan0821 commented
Why are stacked_cur_toks and curr_toks_tensor here updated with the random tokens? I thought this was only required at the input, i.e., for full_embeddings in L67.
Opened this issue · 0 comments
Why are stacked_cur_toks and curr_toks_tensor here updated with the random tokens? I thought this was only required at the input, i.e., for full_embeddings in L67.