some questions about scaling
miganchuanbo opened this issue · 0 comments
miganchuanbo commented
It seems we need to scale up Q and K when using cosine sim. But what is the reason for scaling Q before applying rotary emb?
miganchuanbo opened this issue · 0 comments
It seems we need to scale up Q and K when using cosine sim. But what is the reason for scaling Q before applying rotary emb?