question about RoPE code
yukyeongmin opened this issue ยท 3 comments
self.cos_cached
and self.sin_cached
have same shape of x
, aren't they??
So if this line intended to compute RoPE with partial of x which means x[...,:self.d]
,
i think this line should be
x_rope = (x_rope * self.cos_cached[...,:self.d) + (neg_half_x * self.sin_cached[...,:self.d])
please let me know if i'm wrong
You are correct that self.cos_cached
and self.sin_cached
have same shape of x
.
And when it comes to the modication, that is also correct because it would ensure that the rotary embeddings are applied only to the subset of features specified by self.d
They have the similar shapes. The truncation of cached sin/cos to x.shape[0]
is truncating them to sequence length. Because the sequence lengths (number of tokens per sample) changes.
Thanks for reply!! @vpj @nagamonish
Didn't you have any problems running that code? The original code didn't work for me with different shape of input. And i thought it's about grammar.