w1oves/Rein

why the total similarity is kept as 1 in the corresponding line of Si, which may lead to wrong modifications?

Closed this issue · 4 comments

Hi author, nice to read such an interesting paper, I would like to ask why the total similarity is kept as 1 in the corresponding line of Si, which may lead to wrong modifications?

  • The application of softmax ensures that the similarity of each row sums to 1.
  • Some features do not need modification, meaning they are dissimilar to every token; in such cases, it would be reasonable for the total similarity of that row to be lower.
  • After removing a token, the similarity for each row can range from 0 to 1.

您好,我同样关注到了这个问题,基于您的回复,我理解了这么处理的意图,但是我还有一个小疑问,在训练过程中是如何在Si中给不用改变的特征始终分配一个较大的值的?

您好,我同样关注到了这个问题,基于您的回复,我理解了这么处理的意图,但是我还有一个小疑问,在训练过程中是如何在Si中给不用改变的特征始终分配一个较大的值的?

当所有其他值都较小时,根据softmax的特性,剩余的值将会相应增大。