SamLynnEvans/Transformer

Did not multiply embedding weights by sqrt(d_model)

orena1 opened this issue · 4 comments

Hi,
In this line:

return self.embed(x)

I think you need to multiply the embedding by sqrt(d_model)
image

@orena1 Hi, the implementation also didn't share the embedding weights, right?

@orena1 The code actually has * math.sqrt(self.d_model) in the positional embedding class. In forward method.

Did somebody know the reason for multiplying embedding weights by sqrt(d_model)?

@orena1 Hi, the implementation also didn't share the embedding weights, right?

Yes, the implementation didn't share the embedding weights.