Confusion regarding embedding space

The paper says, "...the same weight matrix is shared between the two embedding layers..." referring to the encoder and decoder embedding layers respectively. However, in the lines below I can see that the encoder initializes its own embedding matrix, separate from the one in the decoder. Can you explain why this is so?

attention-is-all-you-need-pytorch/transformer/Models.py

Line 57 in 132907d

self.src_word_emb = nn.Embedding(n_src_vocab, d_word_vec, padding_idx=pad_idx)

attention-is-all-you-need-pytorch/transformer/Models.py

Line 96 in 132907d

self.trg_word_emb = nn.Embedding(n_trg_vocab, d_word_vec, padding_idx=pad_idx)

Weights are shared in the __init__ function of class Transformer

attention-is-all-you-need-pytorch/transformer/Models.py

Line 179 in 132907d

if trg_emb_prj_weight_sharing:

attention-is-all-you-need-pytorch/transformer/Models.py

Line 183 in 132907d

if emb_src_trg_weight_sharing:

i am also confused on that key , query , value get's trained or the embedding get's trained please help