Question about rotation.

Question

Question about rotation.

Closed this issue 4 months ago · 3 comments

I notice that the rotation of attention and mlp is opposite of what is shown in the fig4&5 of the paper. I am wondering how to keep computational invariance?
embedding : WQ
QKV: WQ
forward: x-> embedding->XQ -> QKV->XQWQ.
Is there anything I'm missing?

Answer 1 · 2024-07-28T12:45:19.000Z

Hi @mxjmtxrm , I have the same question. Have you figured it out?

Answer 2 · 2024-09-14T08:52:17.000Z

In general, Y = X @ W.T
As a result, Y = (X@Q) @ (W@Q).T = X @ W.T

Answer 3 · 2024-09-14T10:29:20.000Z

In general, Y = X @ W.T As a result, Y = (X@Q) @ (W@Q).T = X @ W.T

@Coco58323 Thanks!