Use of mask
Opened this issue · 0 comments
c120129754 commented
Mask is used to select topK nodes, while the operation is addition rather multiplication which is a common way of using mask. Could you plz answer this question or give an example of how mask influences the calculated score in topK selection.