wlin12/wang2vec

Using both negative sampling and nce

Closed this issue · 2 comments

Greetings,

First of all -- thank you very much for publishing this excellent word embedding tool. I am using it in my MSc thesis on dependency parsing, and the structured skip-gram model seems to outperform all the alternatives. I will be very happy to cite your article in my thesis.

I have a question regarding the use of negative sampling and nce. As I understand it from the article Distributed Representations of Words and Phrases and their Compositionality by Mikolov et. al., negative sampling and nce are two different approaches to differentiating data from noise. After experimenting a bit with wang2vec, I have found that it is possible to specify a positive integer value as parameter to negative sampling and nce at the same time.

My question is what happens when I run wang2vec with non-zero values for both parameters. Will it use only one of them (and if so, which)? Or will the two be combined in some way (if so, how)?

Thanks in advance for your answer!

Kind regards,
Henrik H. Løvold
LTG Group, Uni. of Oslo

Hi Henrik,

Yes, if you set both to a non negative number it will maximize both objectives, which generally leads to suboptimal results. Regular behavior should only use either nce , negative sampling or hierarchical softmax.

Good luck with your thesis!

Cheers,
Wang Ling

Thank you, that is very helpful!

Closing this now :)