Different embeddings for input/output words?

Question

Different embeddings for input/output words?

phillynch7 opened this issue 6 years ago · 2 comments

Hey there, great skipgram example, so thank you for that.

I have a question on why you decided to use different embeddings for the "input" words and "output"/"negative" words? See lines below:
https://github.com/theeluwin/pytorch-sgns/blob/master/model.py#L29:L30

I imagine this could give better performance on some problem, but haven't been able to test this myself yet. Thanks for the help!

Answer 1 · 2018-11-01T12:21:34.000Z

This implementation is totally based on the very first word2vec paper (https://arxiv.org/abs/1310.4546). Using the same embedding (also known as a siamese modeling) also works but according to the paper, the overall embedding is considered as 2-layered neural network with 1 hidden layer, interpreting W1 as 'input vector' and W2 as 'output vector'.

Answer 2 · 2018-11-01T12:35:42.000Z

Ahh, I see that now in the paper. Should have read more carefully, was mostly focused on working out the loss function.

Appreciate your help, thanks!