Is there any difference between node2vec-c and reference implementation of Node2Vec?
dkorkinof opened this issue · 3 comments
I was wondering if there is (algorithmically) any difference between your C++ implementation of Node2Vec and the original reference implementation from Stanford. For instance, I saw something about subsampling frequent nodes in the C++ code.
I'm asking because I tried a few other Node2Vec implementations, including the one in pytorch_geometric, and have had trouble replicating the good performance of your C++ code. So I was wondering if there is anything missing in those.
Can you link the original implementation you are mentioning explicitly? I have reimplemented the official Python code that used the Gensim library underneath. The subsampling actually comes from word2vec implementation in Gensim (for that matter, it's standard word2vec stuff). I have provided some subsampling analysis in Section 3.6 of the VERSE paper. I believe the subsampling is rather key, as it effectively shifts the positive pair distribution.
Yeah, I tried to keep it close to Python implementation (to be honest, there is not much in Python code per se, I was just reimplementing Gensim). I believe this is still one of the fastest implementations available, since I generate the walks on-the-fly and store the precomputed walk probabilities efficiently.