Error in Equation 16?
zhongmz opened this issue · 1 comments
zhongmz commented
It appears that the current formulation is ?
$$
q_{t,i} = [q^{C}{t,i};q{t}^R],
$$
DeepSeekDDM commented
The formula is correct. q^R are multi-head, and only k^R is shared.
You can refer to the illustration of DeepSeek-V2 for an intuitive understanding.