eXascaleInfolab/JUST

reproducing the scores

amblee0306 opened this issue · 2 comments

Hi there,

The paper mentioned that it is possible to achieve ~89% F1 scores for DBLP dataset but it doesn't seem to be achievable. I m tested using alpha 0.3 and 0.5, m=3, window size 6 and 10. All of the combination gave approximately 82% F1-scores. Can I know where else can I tune it?

Thanks!

Hello, here are the parameters using 80% training data:
embedding dimensions: 128
random walk length: 100, number of walks: 10, window size: 10
alpha: 0.5, m: 3

Hello, a quick comment because Gensim and other libraries are updated and some functions are deprecated:
When calling Word2Vec, please make sure that the training algorithm is skip gram, as the default in some library versions is cbow.