snap-stanford/SATURN

train-saturn.py debugging

otoky opened this issue · 2 comments

otoky commented

Hi! I followed the tutorial until the train-saturn section- I am using google colab and have pip imported all the variables and am running on a GPU enabled virtual machine.
(!pip install torch==1.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html)

When i run it, I am getting error:
File "/gdrive/MyDrive/SATURN/files/train-saturn.py", line 1050, in
torch.cuda.set_device(args.device_num)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 404, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

not very well versed in this stuff, but is there an issue with the code recognizing which GPU?

Yanay1 commented

Invalid device ordinal means it is trying to set the device number to a gpu that is not on the machine I think. Try changing device_num to 0.

Otherwise there might be an issue with torch install.

otoky commented

i think i got it to work thanks!