Question about rotation.
Closed this issue · 3 comments
mxjmtxrm commented
I notice that the rotation of attention and mlp is opposite of what is shown in the fig4&5 of the paper. I am wondering how to keep computational invariance?
embedding : WQ
QKV: WQ
forward: x-> embedding->XQ -> QKV->XQWQ.
Is there anything I'm missing?
Coco58323 commented
In general, Y = X @ W.T
As a result, Y = (X@Q) @ (W@Q).T = X @ W.T
ponytaill commented
In general, Y = X @ W.T As a result, Y = (X@Q) @ (W@Q).T = X @ W.T
@Coco58323 Thanks!