muhanzhang/pytorch_DGCNN

Error arised when running the program

jingcaiguo opened this issue · 7 comments

python: src/lib/msg_pass.cpp:20: void n2n_construct(GraphStruct*, long long int*, Dtype*): Assertion `nnz == (int)graph->num_edges' failed.

could help to check this problem?

This seems to be related to graph construction. Which data are you using? Can you check whether your graph is in the correct format? For example, check whether each node line starts with t, m, and m neighbor indices. Check whether there are exactly m neighbor indices the same as you specified m.

I am getting the same error message. Could you further elaborate on what nnz and int(graph) in the error message refer to? what is the requirement the n2n_construct() function is testing here?

I am able to generate my test/training graphs of <class '__main__.GNNGraph'> and the following features:

print(len(train_graphs[0].node_features)) # 254 [[1.14e+02 1.60e+00]
                                               # [9.33e+02 3.35e+00]
                                               # [4.74e+02 8.00e-01]
                                             # [6.26e+02 3.52e+00]...]    
print(len(train_graphs[0].degs)) # 254 [4, 6, 8, 6, 1, 7, ...]
print(train_graphs[0].num_nodes) # 254
print(train_graphs[0].label) # 0
print(len(train_graphs[0].edge_pairs)) # 2172, [114 933 ... 381 794]
print(train_graphs[0].edge_features) # None
print(train_graphs[0].num_edges) # 1086

The error message is as above:

python: src/lib/msg_pass.cpp:20: void n2n_construct(GraphStruct*, long long int*, Dtype*): Assertion `nnz == (int)graph->num_edges' failed.
Aborted (core dumped)

Thanks for your support!

So, in my case I can narrow down the error message to the following part in gnn_lib.py.

    def PrepareSparseMatrices(self, graph_list, is_directed=0):
        assert not is_directed
        total_num_nodes, total_num_edges = self._prepare_graph(graph_list, is_directed)

        n2n_idxes = torch.LongTensor(2, total_num_edges * 2)
        n2n_vals = torch.FloatTensor(total_num_edges * 2)

        e2n_idxes = torch.LongTensor(2, total_num_edges * 2)
        e2n_vals = torch.FloatTensor(total_num_edges * 2)

        subg_idxes = torch.LongTensor(2, total_num_nodes)
        subg_vals = torch.FloatTensor(total_num_nodes)

        idx_list = (ctypes.c_void_p * 3)()
        idx_list[0] = n2n_idxes.numpy().ctypes.data
        idx_list[1] = e2n_idxes.numpy().ctypes.data
        idx_list[2] = subg_idxes.numpy().ctypes.data

        val_list = (ctypes.c_void_p * 3)()
        val_list[0] = n2n_vals.numpy().ctypes.data
        val_list[1] = e2n_vals.numpy().ctypes.data
        val_list[2] = subg_vals.numpy().ctypes.data

       ##########################
        print(total_num_nodes) # 960
        print(total_num_edges)  # 5317
        print(len(n2n_idxes))   # 2
        print(len(n2n_vals)) # 10634

        print(len(e2n_idxes))   # 2
        print(len(e2n_vals)) # 10634
        print(len(subg_idxes)) #2
        print(len(subg_vals)) # 960
        print(len(graph_list)) # 10
       ##########################

        print('Prepare Sparse Matrix start')
        self.lib.PrepareSparseMatrices(self.batch_graph_handle,
                                ctypes.cast(idx_list, ctypes.c_void_p),
                                ctypes.cast(val_list, ctypes.c_void_p))
        print('Prepare Sparse Matrix end')

With the output:

Initiate Sparse Matrix
960
5317
2
10634
2
10634
2
960
10
Prepare Sparse Matrix start
python: src/lib/msg_pass.cpp:20: void n2n_construct(GraphStruct*, long long int*, Dtype*): Assertion `nnz == (int)graph->num_edges' failed.
Aborted (core dumped)

Any suggestions what is going on here?
Thanks!

Hi! Sorry for the late reply. nnz means number of nonzeros in the sparse adjacency matrix constructed, and (int)graph->num_edges means the number of edges in your python constructed graph object. This function transforms python graph to a C++ sparse matrix. The assert does a sanity check of whether the numbers mismatch.

My suggestion is to check whether your input graph has duplicated edges defined, or whether it has self-loops. Sometimes nx.Graph() handles them differently from the C++ library here. Please remove duplicated edges/self-loops from your data and have a try again.

Hey Muhan, good thinking, thanks and I really appreciate the explanation! Unfortunately, removing duplicated edges and self-loops did not solve the problem. I can get the pipeline running without problems if I convert my networkx graphs into the Matlab text file and then import the text file again. But that seems to be quite inefficient for my task considering I already have networkx graphs and you are also just converting the Matlab text file into networkx graph objects and then running them through the GNNGraph function.

If I print the graph characteristics after running my networkx graph through your GNNGraph function without the Matlab text file conversion:

class GNNGraph(object):
    def __init__(self, g, node_feat):
        self.node_tags = list(g.nodes)
        self.num_nodes = len(self.node_tags)
        self.label = g.y
        self.node_features = node_feat
        self.degs = list(dict(g.degree).values())
        self.edge_features = None

        if len(g.edges()) != 0:
            x, y = zip(*g.edges())
            self.num_edges = len(x)
            self.edge_pairs = np.ndarray(shape=(self.num_edges, 2), dtype=np.int32)
            self.edge_pairs[:, 0] = x
            self.edge_pairs[:, 1] = y
            self.edge_pairs = self.edge_pairs.flatten()
        else:
            self.num_edges = 0
            self.edge_pairs = np.array([])

        print('New graph')
        print('Number of nodes: {0}'.format(self.num_nodes))
        print('Number of node tags: {0}'.format(len(self.node_tags)))
        print('Node tags: {0}'.format(self.node_tags))
        print('Node tags type: {0}'.format(type(self.node_tags)))
        print('Node feature length: {0}'.format(len(self.node_features)))
        print('Node feature type: {0}'.format(type(self.node_features)))
        print('Node features: {0}'.format(self.node_features))
        print('Number of edges: {0}'.format(self.num_edges))
        print('Length edge pairs: {0}'.format(len(self.edge_pairs)))
        print('Type of edge pairs: {0}'.format(type(self.edge_pairs)))
        print('Edge pairs: {0}'.format(self.edge_pairs))
        print('Label: {0}'.format(self.label))

The output is

New graph
Number of nodes: 81
Number of node tags: 81
Node tags: [79, 47, 52, 26, 73, 77, 81, 74, 2, 5, 70, 24, 63, 1, 18, 39, 48, 72, 71, 61, 31, 40, 46, 42, 29, 58, 69, 19, 78, 27, 6, 11, 45, 32, 12, 60, 14, 57, 25, 66, 3, 49, 7, 59, 53, 16, 67, 21, 50, 55, 54, 44, 13, 4, 36, 43, 35, 10, 22, 75, 38, 23, 20, 37, 15, 65, 33, 51, 68, 8, 41, 76, 80, 28, 34, 62, 64, 30, 17, 56, 9]
Node tags type: <class 'list'>
Node feature length: 81
Node feature type: <class 'numpy.ndarray'>
Node features: [[1.26860e+01]
 [1.22515e+01]
 [1.25422e+01]
 [1.79270e+00]
 [1.17031e+01]
 [1.10693e+01]
 [4.40550e+00]
 [3.46920e+00]
 [3.66020e+00]
 [5.21040e+00]
 [2.03350e+00]
 [1.22147e+01]
 [9.26360e+00]
 [2.36480e+00]
 [1.93570e+00]
 [9.59690e+00]
 [4.97320e+00]
 [4.40630e+00]
 [1.92700e+00]
 [6.50100e-01]
 [2.47380e+00]
 [2.20550e+00]
 [1.49770e+00]
 [2.31440e+00]
 [4.71200e+00]
 [6.14120e+00]
 [5.54830e+00]
 [8.61950e+00]
 [4.68700e+00]
 [8.90920e+00]
 [8.98640e+00]
 [9.44690e+00]
 [3.28840e+00]
 [3.29900e+00]
 [1.00760e+01]
 [1.50000e-03]
 [6.10610e+00]
 [6.93300e-01]
 [4.26350e+00]
 [5.66050e+00]
 [5.05700e+00]
 [9.93870e+00]
 [5.32800e-01]
 [7.72990e+00]
 [8.56790e+00]
 [7.73990e+00]
 [1.04963e+01]
 [6.41460e+00]
 [1.04960e+01]
 [3.33070e+00]
 [1.15685e+01]
 [1.14723e+01]
 [1.13877e+01]
 [6.97700e-01]
 [1.13092e+01]
 [2.45940e+00]
 [8.22590e+00]
 [8.36690e+00]
 [3.82700e-01]
 [1.05975e+01]
 [3.12800e+00]
 [6.11730e+00]
 [4.79870e+00]
 [5.59120e+00]
 [8.12950e+00]
 [6.39330e+00]
 [4.42680e+00]
 [9.30840e+00]
 [7.43170e+00]
 [4.50030e+00]
 [2.49620e+00]
 [9.21130e+00]
 [7.30900e+00]
 [2.71750e+00]
 [1.11623e+01]
 [8.10230e+00]
 [6.44780e+00]
 [1.17065e+01]
 [1.52510e+00]
 [2.94680e+00]
 [6.02380e+00]]
Number of edges: 1645
Length edge pairs: 3290
Type of edge pairs: <class 'numpy.ndarray'>
Edge pairs: [79 47 79 ...  9 17  9]
Label: 1

Is it a problem that the node tags and hence edge pairs are not sequential starting from 1? However, I am comparing graphs with string labels. I build a dictionary to store the strings and replace each string with an integer. Some nodes are exclusively found in some graphs and absent in others. Therefore those node tags are missing.
Thanks again for your support!

The sequential labelling is not causing the problem. So, the error persists even when I reorder the graphs and they look like this after passing them through the GNNGraph function:

New graph
Number of nodes: 81
Number of node tags: 81
Node tags: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80]
Node tags type: <class 'list'>
Node feature length: 81
Node feature type: <class 'numpy.ndarray'>
Node features: [[ 1.7591]
 [ 8.2386]
 [ 2.5039]
 [ 1.5928] and so on]
Number of edges: 1645
Length edge pairs: 3290
Type of edge pairs: <class 'numpy.ndarray'>
Edge pairs: [ 0  1  0 ... 63 80 64]
Label: 1

If I remove the assert statements to track the real error message afterwards, I get:

n2n_sp = torch.sparse.FloatTensor(n2n_idxes, n2n_vals, torch.Size([total_num_nodes, total_num_nodes]))
RuntimeError: size is inconsistent with indices: for dim 0, size is 49781 but found index 94661596694512

I found the following thread but I am working with the most recent PyTorch version:
https://discuss.pytorch.org/t/runtimeerror-sizes-is-inconsistent-with-indices-pytorch-0-4-1/94181
Many thanks for your help, Muhan!

If the txt way works, can you check what are different between a direct networkx graph and a networkx graph transformed from txt format? For example, select the same graph, check the differences in node tags, feature lengths, number types, etc. I guess there are some subtle reasons causing some discrepancy between the two ways.