jadore801120/attention-is-all-you-need-pytorch

Confusion regarding embedding space

IamAdiSri opened this issue · 2 comments

The paper says, "...the same weight matrix is shared between the two embedding layers..." referring to the encoder and decoder embedding layers respectively. However, in the lines below I can see that the encoder initializes its own embedding matrix, separate from the one in the decoder. Can you explain why this is so?

self.src_word_emb = nn.Embedding(n_src_vocab, d_word_vec, padding_idx=pad_idx)

self.trg_word_emb = nn.Embedding(n_trg_vocab, d_word_vec, padding_idx=pad_idx)

Weights are shared in the __init__ function of class Transformer

if trg_emb_prj_weight_sharing:

if emb_src_trg_weight_sharing:

i am also confused on that key , query , value get's trained or the embedding get's trained please help