openai/finetune-transformer-lm

Concatenating context and embeddings?

windweller opened this issue · 1 comments

Hi,

Congratulations on the paper! Those of us who actually worked on ROCStories know how difficult it is!!

I have a small question on how embeddings are handled in the code.

we = tf.get_variable("we", [n_vocab+n_special+n_ctx, n_embd], initializer=tf.random_normal_initializer(stddev=0.02))
e = tf.gather(we, X)
h = tf.reduce_sum(e, 2)

I believe this is equivalent to embeddings_look_up() that people normally use...so we is word embedding. My question is: what is n_ctx (context embedding)? May I ask how is this used in the model?

Thank you very much!


Now that I looked at the code closer, is it an artifact of the Transformer Decoder??

I always thought the positional encoding (sine wave) is concatenated to the word embedding...but turns out it's summed...