Bug in rotary positional embedding

Question

Bug in rotary positional embedding

scv11 opened this issue 3 months ago · 0 comments

I have copied the original code. But that has an error. The running result shows that there is a tensor operation exception in this statement.

x_rope = (x_rope * self.cos_cached[:x.shape[0]]) + (neg_half_x * self.sin_cached[:x.shape[0]])

and the error information look like this
RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 3
After debugging, I found that the positional encodings are applied only to a partial set of features(3 in the last dim in this test), but the cos_cached and sin_cached have the same feature dimension as the original x tensor(4 in this test). So there will be error when multiplying by elements. So I think the code should be like this

x_rope = ((x_rope * self.cos_cached[:x.shape[0], :, :, :self.d]) +
          (neg_half_x * self.sin_cached[:x.shape[0], :, :, :self.d]))

If I have any mistakes, please feel free to tell me.