Confusion regarding embedding space
IamAdiSri opened this issue · 2 comments
IamAdiSri commented
The paper says, "...the same weight matrix is shared between the two embedding layers..." referring to the encoder and decoder embedding layers respectively. However, in the lines below I can see that the encoder initializes its own embedding matrix, separate from the one in the decoder. Can you explain why this is so?
kian98 commented
Weights are shared in the __init__
function of class Transformer
chaudharynabin6 commented
i am also confused on that key , query , value get's trained or the embedding get's trained please help