random_walks() seems slow
hananell opened this issue · 1 comments
hananell commented
Hi
I am searching for a quick way to do random walks and saw this package that claims to be good, but when comparing it with naive python approach, the naive approach was much faster... what am I missing?
import networkx as nx
import numpy as np
import csrgraph as cg
from time import time
import random
walk_len = 20
G = nx.fast_gnp_random_graph(100, 0.3, directed=True) # just a random graph
GG = cg.csrgraph(G)
t1 = time()
walks = GG.random_walks(walklen=walk_len)
t2 = time()
walks = np.zeros((len(G.nodes()), walk_len))
for i, node in enumerate(G.nodes()):
cur_node = node
for j in range(walk_len):
neighbor = random.choice(list(G.neighbors(cur_node))) # choose one random neighbor
walks[i][j] = neighbor # add it to walk
cur_node = neighbor # save it to cur_node to continue from it next iteration
t3 = time()
print('cg ', t2-t1)
print('naive ', t3-t2)
output:
cg 5.419240713119507
naive 0.010957479476928711
YLTsai0609 commented
Hi @hananell
some concern here
And I tried another test below:
import networkx as nx
import numpy as np
from time import time
import random
walk_len = 20
trials = 100
warm_start = 15
warm_start_csr_runtime = []
csr_runtime = []
G = nx.fast_gnp_random_graph(100, 0.3, directed=True) # just a random graph
GG = cg.csrgraph(G)
for t in range(warm_start):
tic = time()
walks = GG.random_walks(walklen=walk_len)
toc = time()
warm_start_csr_runtime.append(toc-tic)
print(
warm_start_csr_runtime,
np.mean(warm_start_csr_runtime),
sep='\n\n'
)
gives
[1.511685848236084, 0.0009768009185791016, 0.0008192062377929688, 0.0009591579437255859, 0.001013040542602539, 0.0010869503021240234, 0.0008718967437744141, 0.0007932186126708984, 0.0009889602661132812, 0.0010161399841308594, 0.0010478496551513672, 0.0009562969207763672, 0.001085042953491211, 0.00102996826171875, 0.001010894775390625]
0.10168941815694173
As you see, the first trail took more time.
it seems like there is a compilation issue when executing _random_walk()
Since numba need to compile the data type to static one. so it cause some time when first run.
just do a warm_start to solve this.