why the total similarity is kept as 1 in the corresponding line of Si, which may lead to wrong modifications?

Question

why the total similarity is kept as 1 in the corresponding line of Si, which may lead to wrong modifications?

Closed this issue 4 months ago · 4 comments

Hi author, nice to read such an interesting paper, I would like to ask why the total similarity is kept as 1 in the corresponding line of Si, which may lead to wrong modifications?

Answer 1 · 2024-05-14T15:24:01.000Z

jj-ccc commented 4 months ago

Answer 2 · 2024-05-15T11:19:14.000Z

The application of softmax ensures that the similarity of each row sums to 1.
Some features do not need modification, meaning they are dissimilar to every token; in such cases, it would be reasonable for the total similarity of that row to be lower.
After removing a token, the similarity for each row can range from 0 to 1.

Answer 3 · 2024-05-30T15:35:19.000Z

您好，我同样关注到了这个问题，基于您的回复，我理解了这么处理的意图，但是我还有一个小疑问，在训练过程中是如何在Si中给不用改变的特征始终分配一个较大的值的？

Answer 4 · 2024-06-05T06:58:18.000Z

您好，我同样关注到了这个问题，基于您的回复，我理解了这么处理的意图，但是我还有一个小疑问，在训练过程中是如何在Si中给不用改变的特征始终分配一个较大的值的？

当所有其他值都较小时，根据softmax的特性，剩余的值将会相应增大。