Did not multiply embedding weights by sqrt(d_model)
orena1 opened this issue · 4 comments
fabrahman commented
@orena1 The code actually has * math.sqrt(self.d_model) in the positional embedding class. In forward method.
zhangxixi0904 commented
Did somebody know the reason for multiplying embedding weights by sqrt(d_model)?
wangzelin-em commented
@orena1 Hi, the implementation also didn't share the embedding weights, right?
Yes, the implementation didn't share the embedding weights.