spcl/QuaRot

Question about rotation.

Closed this issue · 3 comments

I notice that the rotation of attention and mlp is opposite of what is shown in the fig4&5 of the paper. I am wondering how to keep computational invariance?
embedding : WQ
QKV: WQ
forward: x-> embedding->XQ -> QKV->XQWQ.
Is there anything I'm missing?

Hi @mxjmtxrm , I have the same question. Have you figured it out?

In general, Y = X @ W.T
As a result, Y = (X@Q) @ (W@Q).T = X @ W.T

In general, Y = X @ W.T As a result, Y = (X@Q) @ (W@Q).T = X @ W.T

@Coco58323 Thanks!