question about the normalization on the attention weight

Question

question about the normalization on the attention weight

amiltonwong opened this issue 4 years ago · 1 comments

From the code in cls and partseg, the attention weights are already normalized by self.softmax(). Why did you add an extra line attention / (1e-9 + attention.sum(dim=1, keepdims=True)) for weight normalization ?

Any particular reason?

Thanks~

Answer 1 · 2021-03-19T14:54:11.000Z

Hi,
Good question.
Please be care for the dimension of normalization, in the experiments, we find this way can make the training process more stable.