LIAMF-USP/word2vec-TF

Problem with gensim

felipessalvatore opened this issue · 0 comments

After doing some tests, I got a strange result.

gensim perform better than the tf implementation using a Portuguese corpus ("g" stand for gensim and tf stand for tensorflow, the number on the name is the size of the word embedding):

portuguese_score

but changing to an English corpus the gensim model has a score close to 0:

english_score

I don't known why the gensim model is performing so bad when we change language. This is probably a bug in how gensim train the corpus, it should be great if someone address this problem.