VHRanger/CSRGraph

There is something wrong with node2vec

wangbingnan136 opened this issue · 6 comments

Dir sir:
image
when I used the biased random walk,I found that the code seems to have an overflow problem,my max node name is 2044,but the result contained some very big int.
But when I use random walk like the code below:
image
Everything seems ok...

Hi,

I have been chasing a segfault bug in the node2vec walks implementation (random walks with return_weight and/or explore_weight different than 1.

Your overflow bug here seems related to this (it would segfault if the overflow points outside of addresable memory)

Could you share data and code to reproduce it? I would be interested in fixing this ASAP.

Hi,

I have been chasing a segfault bug in the node2vec walks implementation (random walks with return_weight and/or explore_weight different than 1.

Your overflow bug here seems related to this (it would segfault if the overflow points outside of addresable memory)

Could you share data and code to reproduce it? I would be interested in fixing this ASAP.

你好

默认情况下,CSRGraph应该可以处理有向图。

您能否给我一个指向数据/代码的OneDrive链接以调查和重现您的错误?

请注意,生成CSRGraph对象的方式不是最佳的。您可以直接从networkx图生成它,G = cg.csrgraph(G)其中G是nx.Graph或nx.DiGraph对象,也可以直接从Edgelist生成它G = cg.read_edgelist('wiki_edgelist.txt', header=None, sep=',')

如果该错误仍然存​​在,那么如果我可以以某种方式重现它,我将很乐意修复它。

https://github.com/shenweichen/GraphEmbedding/tree/master/data/wiki I used the "Wiki_edgelist.txt" in this program.The original codes are:

import networkx as nx
G=nx.read_edgelist('Wiki_edgelist.txt',create_using = nx.DiGraph(), nodetype = None, data = [('weight', int)])

And then I called node2vec function with return_weight!=1 and neighbour_weight!=1,then the program broke down..

May I participate in this project? I think csrgraph is a very good project. It has played a very good acceleration role in our practical application. This is the fastest stand-alone graph embedding python library I have used so far.

I also found that in nodevectors:
"def generate_neg_edges(G, testing_edges_num, seed=42):
nnodes = len(G.nodes)
# Make a full graph (matrix of 1)
negG = np.ones((nnodes, nnodes))
np.fill_diagonal(negG, 0.)
# Substract existing edges from full graph
# generates negative graph
original_graph = nx.adj_matrix(G).todense()
negG = negG - original_graph
# get negative edges (nonzero entries)
neg_edges = np.where(negG > 0)
random.seed(seed) # replicability!
rng_edges = random.sample(range(neg_edges[0].size), testing_edges_num)
# return edges in (src, dst) tuple format
return list(zip(
neg_edges[0][rng_edges],
neg_edges[1][rng_edges]
))"
The generate_neg_edges used too much memory in "negG = np.ones((nnodes, nnodes))" and maybe we should consider use
some other strategy with scipy.sparse to avoid thie problem.

May I participate in this project? I think csrgraph is a very good project. It has played a very good acceleration role in our practical application. This is the fastest stand-alone graph embedding python library I have used so far.

You're welcome to participate as much as you want!

Just submit pull requests that you think are relevant.

I'll take a look at your code that raises the segfault, thanks.

For the generate_neg_edges, I think it can be optimized a lot, I agree. The current implementation follows the one in a biostatistics paper and was built for small (<100k nodes) graphs. We can improve on it.

I caught the bug with node2vec random walks, it should be fixed forever now. I added a unit test for it.