Concatenating context and embeddings?
windweller opened this issue · 1 comments
windweller commented
Hi,
Congratulations on the paper! Those of us who actually worked on ROCStories know how difficult it is!!
I have a small question on how embeddings are handled in the code.
we = tf.get_variable("we", [n_vocab+n_special+n_ctx, n_embd], initializer=tf.random_normal_initializer(stddev=0.02))
e = tf.gather(we, X)
h = tf.reduce_sum(e, 2)
I believe this is equivalent to embeddings_look_up()
that people normally use...so we
is word embedding. My question is: what is n_ctx
(context embedding)? May I ask how is this used in the model?
Thank you very much!
Now that I looked at the code closer, is it an artifact of the Transformer Decoder??
windweller commented
I always thought the positional encoding (sine wave) is concatenated to the word embedding...but turns out it's summed...