random_walk_cuda is causing an illegal memory access
ProfDoof opened this issue · 7 comments
Hi,
When running the following code, I get an illegal memory access error with the following graph. I am not sure why and do not understand the algorithm or C++ well enough to track it down. I do not get the error when I set device to 'cpu'
.
I'm using the nightly build of pyg installed through a locally built conda package, and version 1.6.1 of PyTorch-cluster.
from torch_geometric.data import Data
from torch_geometric.utils import to_networkx
from networkx.drawing.nx_agraph import write_dot
import torch
new_node_ids = [x for x in range(7)]
sources = [
0, 1, 2, 2, 4, 5,
]
targets = [
1, 2, 3, 4, 5, 2,
]
data = Data(torch.tensor(new_node_ids), torch.tensor([sources, targets]))
data.num_nodes = 7
write_dot(to_networkx(data), 'test_test.dot')
device = 'cuda'
rowptr, col, perm = data.to(device).csr()
rowptr, col = rowptr[None], col[None]
print(rowptr, col)
start_indices = torch.arange(0, data.num_nodes, dtype=torch.long).flatten().to(device)
print(torch.ops.torch_cluster.random_walk(rowptr, col, start_indices,
10, 2, 4))
EDIT:
Here's the error I get
tensor([0, 1, 2, 4, 4, 5, 6, 6], device='cuda:0') tensor([1, 2, 3, 4, 5, 2], device='cuda:0')
Traceback (most recent call last):
File "/home/john/Research/EmbeddingGraphs/cfg2vec/gnn/test.py", line 25, in <module>
print(torch.ops.torch_cluster.random_walk(rowptr, col, start_indices,
File "/home/john/mambaforge/envs/gnn/lib/python3.9/site-packages/torch/_ops.py", line 503, in __call__
return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
This seems to be currently failing because node 6 is an isolated node, so data.num_nodes = 6
should fix this.
This is a minimum example, the actual graph is more complicated and I can't remove the isolated nodes. Also, this doesn't fail for any other values of p or q. It also only happens in the CUDA version, not the CPU version. All that being said, I'm not sure what exactly is going on.
@rusty1s just wanted to check if you had the chance to see this yet this evening.
Will take a look soon.
Wondering if there are any updates on this issue.
Not yet, sorry for the delay.
This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?