EndlessSora/TSIT

Other GPU ids throw error

sandeepjangir07 opened this issue · 4 comments

When using any other GPU devices ID, except 0, the code throws error.
"
Traceback (most recent call last):
File "test.py", line 12, in
opt = TestOptions().parse()
File "/home/jang_sa/phd/AI/domain_adaptation/TSIT/options/base_options.py", line 178, in parse
torch.cuda.set_device(opt.gpu_ids[0])
File "/home/jang_sa/Software/anaconda3/envs/tsit/lib/python3.7/site-packages/torch/cuda/init.py", line 263, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
"
The GPUs are available and device IDs are valid but still error is got !! any solution for this problem ?

It seems working on my side. How many GPUs do you have?

It seems working on my side. How many GPUs do you have?

I have two GPU clusters. One with 8 GPUs and one with 5 but whenever I use CUDA_VISIBLE_DEVICES=[anything other than 0] and gpu_id=(anything other than 0) I get this error !! I think today, I will try to sit and debug it but if you have any hint of whats causing this, i would be very helpful.

thanks

For example, when you modify the --gpu_ids 0 here to --gpu_ids 1, will it cause an error?

For example, when you modify the --gpu_ids 0 here to --gpu_ids 1, will it cause an error?

Hi,
Yes, I cannot do inference on other GPUs as well.