VHRanger/CSRGraph

Faster sampling with Vose's alias method

MaxHalford opened this issue · 2 comments

Hey!

This is not an issue but rather a suggestion :).

From what I understand, you walk on a graph by randomly hopping from one node to another. You do this via a roulette wheel method that runs in a n + log(n) time: n for building the wheel and log(n) for the subsequent binary search. There's an O(1) way to do via Vose's alias method (this is definitely worth a read). I implemented this in Cython here. Naturally this costs a bit of memory, but it might be worth it, I don't know.

Just wanted to make sure you knew about this!

PS: great article on challenging GNNs, that's how I found you :)

Hey Max,

Thanks for the comment!

In our case it's n + log(n) but n is the number of edges for a node. So unless we're working on graphs with a huge average degree, considerations like cache locality and efficient implementation matters most for performance (since n < 50 in most cases we see).

That said, I'll keep the issue open so I can look into benchmarking your proposed method against my "naive" method

Just to track improvements to the random walk sampling in this issue, it should be possible to accelerate the node2vec walks with this:

https://louisabraham.github.io/articles/node2vec-sampling.html