question about the normalization on the attention weight
amiltonwong opened this issue · 1 comments
amiltonwong commented
HI, @MenghaoGuo ,
From the code in cls and partseg, the attention weights are already normalized by self.softmax()
. Why did you add an extra line attention / (1e-9 + attention.sum(dim=1, keepdims=True))
for weight normalization ?
Any particular reason?
Thanks~
MenghaoGuo commented
Hi,
Good question.
Please be care for the dimension of normalization, in the experiments, we find this way can make the training process more stable.