index X is out of bounds for axis 0 with size Y when pooling
TMorville opened this issue · 2 comments
First off, thanks for developing and sharing this interesting package. I've forked the repo and all test cases work fine. This is probably related to #14, but I've made a new issue because I have data.
I have an adjacency matrix from a large directed graph. The dimensions of the adjacency matrix are (7919711, 7116242)
and the structure is extremely sparse, number of non-zero elements are 2732656
.
When I try to run the pooling on a subset (10000x10000)
of my own data that you can find here (5.07 KB file) I can produce errors with the flavour (ran on sparse_adj_subset)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-91-7776269d0a82> in <module>()
----> 1 graphs, perm = coarsening.coarsen(ajd_sparse_ss, levels=3, self_connections=False)
~/projects/erst/graph-embedding/lib/coarsening.py in coarsen(A, levels, self_connections)
8 levels.
9 """
---> 10 graphs, parents = metis(A, levels)
11 perms = compute_perm(parents)
12
~/projects/erst/graph-embedding/lib/coarsening.py in metis(W, levels, rid)
80 cc = idx_col[perm]
81 vv = val[perm]
---> 82 cluster_id = metis_one_level(rr,cc,vv,rid,weights) # rr is ordered
83 parents.append(cluster_id)
84
~/projects/erst/graph-embedding/lib/coarsening.py in metis_one_level(rr, cc, vv, rid, weights)
140 for ii in range(N):
141 tid = rid[ii]
--> 142 if not marked[tid]:
143 wmax = 0.0
144 rs = rowstart[tid]
IndexError: index 9977 is out of bounds for axis 0 with size 9974
And if I rerun, I get
index 9994 is out of bounds for axis 0 with size 9974
showing that the index changes.
I am running with graphs, perm = coarsening.coarsen(ajd_sparse_ss, levels=3, self_connections=False)
but setting self_connections=True
gives similar problems.
OK. So the switching of indexes seems to stem from lines 55-56
if rid is None:
rid = np.random.permutation(range(N))
which is later used in metis_one_level(rr,cc,vv,rid,weights)
for ii in range(N):
tid = rid[ii]
if not marked[tid]:
wmax = 0.0
rs = rowstart[tid]
marked[tid] = True
bestneighbor = -1
where the bug appears. Here N = rr[nnz-1] + 1
which is 9974 in the test data. This conflicts with the maximum value of rid
, 9999, which sets tid. So whenever the loop
for ii in range(N):
tid = rid[ii]
goes over 9974, it gives tid
a value > N, which is then referred in
marked = np.zeros(N, np.bool)
rowstart = np.zeros(N, np.int32)
rowlength = np.zeros(N, np.int32)
cluster_id = np.zeros(N, np.int32)
but all of those are of length 9974, hence the index error. Here is the print of a subsample of tid
before a crash
Value of tid: 322
Value of tid: 2881
Value of tid: 8202
Value of tid: 9726
Value of tid: 8039
Value of tid: 126
Value of tid: 276
Value of tid: 9994
fixing the above manually resolves the bug:
marked = np.zeros(10000, np.bool)
rowstart = np.zeros(10000, np.int32)
rowlength = np.zeros(10000, np.int32)
cluster_id = np.zeros(10000, np.int32)
but yields yet another.
AssertionError Traceback (most recent call last)
<ipython-input-6-7776269d0a82> in <module>()
----> 1 graphs, perm = coarsening.coarsen(ajd_sparse_ss, levels=3, self_connections=False)
~/projects/erst/graph-embedding/lib/coarsening.py in coarsen(A, levels, self_connections)
9 """
10 graphs, parents = metis(A, levels)
---> 11 perms = compute_perm(parents)
12
13 for i, A in enumerate(graphs):
~/projects/erst/graph-embedding/lib/coarsening.py in compute_perm(parents)
199 indices_node = list(np.where(parent == i)[0])
200 print("Len of indices_node", len(indices_node))
--> 201 assert 0 <= len(indices_node) <= 2
202 #print('indices_node: {}'.format(indices_node))
203
AssertionError:
which happens because the length of indices_node
is 1208, but should be either one or zero. Perhaps this need sit own tracker?
@TMorville I have the same problem. What did you end up doing?