About self attention

Question

About self attention

Janspiry opened this issue 4 years ago · 0 comments

Hello, In the Self_Attn modules, the value of gamma is torch.zeros(1), then get the out by out = x + gamma*out，Why the vlaue of gamma is zero rather than others like torch.ones(1)?