qitianwu/SGFormer

Running out of memory on ogbn-arxiv

Closed this issue · 1 comments

Hi,

I am currently trying to run the code in the script for ogbn-arxiv, but I am running out of memory. Right now, it attempts to do a matrix multiply between two tensors which are each 169k by 256, and it allocated 100GB of VRam to do so.

I am wondering what settings I need to change in the script to get it to run?

Specifically,

attention_num = torch.sigmoid(torch.einsum("nhm,lhm->nlh", qs, ks)) # [N, L, H]
is where the error occurs.

Am I supposed to use main-batch.py instead of main.py?

Thanks!

Thanks for letting us know about this issue. This is indeed a mistake in the original code that adopts the quadratic attention and comments out the linear attention. Now the bug has been fixed with the correct version of the linear attention we actually used for SGFormer. The code should run smoothly on the large datasets without OOM (for arxiv, we use main.py and for other large datasets, we use main-batch.py)