How much memory need in ogbn_products test?

Question

How much memory need in ogbn_products test?

saladcat opened this issue 4 years ago · 4 comments

how much memory need in ogbn_products test phase?
in dir "examples/ogb/ogbn_products" I run
python test.py --self_loop --num_layers 14 --gcn_aggr softmax_sg --t 0.1

Pycharm return
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

it seems to OOM, but I have 160GB+ memory. is it not enough for test?

Answer 1 · 2020-07-08T20:36:11.000Z

Hi @saladcat. I am sorry that the current implementation of the full batch testing is not efficient. It takes 405 G RAM to do inference on the whole graph. There are some inherent problems from the edge index implementation. See https://pytorch-geometric.readthedocs.io/en/latest/notes/sparse_tensor.html for the details. We can try to contact the PyG team to see if we can also implement our aggregation function in the form of SparseTensor. For now, please allocate more RAM for testing. Thanks.

Answer 2 · 2020-07-09T02:27:37.000Z

Why not just do the same as the training phase?

idx_clusters = np.arange(len(sg_nodes))
np.random.shuffle(idx_clusters)
for idx in idx_clusters:
    x_ = x[sg_nodes[idx]].to(device)
    sg_edges_ = sg_edges[idx].to(device)
    mapper = {node: idx for idx, node in enumerate(sg_nodes[idx])}

    inter_idx = intersection(sg_nodes[idx], train_idx)
    training_idx = [mapper[t_idx] for t_idx in inter_idx]

    optimizer.zero_grad()

    pred = model(x_, sg_edges_)
    target = train_y[inter_idx].to(device)

    loss = F.nll_loss(pred[training_idx], target)
    loss.backward()
    optimizer.step()
    loss_list.append(loss.item())

I mean there are any disadvantages to do so? sorry I am new in GCN

Answer 3 · 2020-07-09T02:32:21.000Z

No worries. During the training we partition the graph into small subgraphs. However, this would lose some information of edges. That is why we do a full batch testing. You can also do mini-batch testing by partitioning the graph. But you may observe a drop in performance.

Answer 4 · 2020-07-09T02:35:54.000Z

Fast reply!
I will try, Thank you!😁