reproducing the scores

Question

reproducing the scores

amblee0306 opened this issue 5 years ago · 2 comments

Hi there,

The paper mentioned that it is possible to achieve ~89% F1 scores for DBLP dataset but it doesn't seem to be achievable. I m tested using alpha 0.3 and 0.5, m=3, window size 6 and 10. All of the combination gave approximately 82% F1-scores. Can I know where else can I tune it?

Thanks!

Answer 1 · 2020-07-17T13:38:52.000Z

Hello, here are the parameters using 80% training data:
embedding dimensions: 128
random walk length: 100, number of walks: 10, window size: 10
alpha: 0.5, m: 3

Answer 2 · 2020-08-26T08:44:43.000Z

Hello, a quick comment because Gensim and other libraries are updated and some functions are deprecated:
When calling Word2Vec, please make sure that the training algorithm is skip gram, as the default in some library versions is cbow.