Feature Request: Add GPU selection argument for CLI
luigifcruz opened this issue · 10 comments
This is a feature requested on Slack for an argument to select which GPU device the current instance will utilize. For example --gpu-id 0
.
I assign myself to this issue.
For what it's worth, in my experience, setting the CUDA_VISIBLE_DEVICES
flag like CUDA_VISIBLE_DEVICES=3 turboSETI filename.h5
does not always seem to work. Sometimes it allocates data to GPU 0 anyway.
Unless CUDA_VISIBLE_DEVICES
is actively being removed from the environment (possible by not being passed through an SSH connection?), all CUDA apps should be constrained by that setting. If you can come up with a minimum working example to the contrary it would probably be worthy of a bug report to NVIDIA.
@lacker Are you able to test my PR with a machine with multiple GPUs?
Unless
CUDA_VISIBLE_DEVICES
is actively being removed from the environment (possible by not being passed through an SSH connection?), all CUDA apps should be constrained by that setting. If you can come up with a minimum working example to the contrary it would probably be worthy of a bug report to NVIDIA.
FWIW I tried to repro but I think my initial comment was wrong here. I think what was happening that I misread as a GPU specification error was, despite running in GPU mode, the turboSETI run was still blocked on CPU for a long time, and so it wasn't using any GPU. Not that it was misreading the CUDA_VISIBLE_DEVICES
flag.
Sigh, I did manage to repro this. I just ran
CUDA_VISIBLE_DEVICES=3 turboSETI /datag/pipeline/AGBT21A_996_44/blc25/blc25_guppi_59383_54743_TIC316468545_0053.rawspec.0000.h5 -g y -o ~/xxx/
on blpc2, and it started running on GPU 0 instead of GPU 3.
@luigi
I specified turboSETI -d 3 .... on blpc2 and was assigned to gpu 0 even though gpu 3 was available.
It does run like a bat out of hell but we need to get the gpu assignment down.
That's just the python interpreter in the obs
conda environment, nvidia-smi truncates it when it displays. So that's just a normal turboseti run. I have observed work getting assigned to gpu 1 there when giving it --gpu_id=2
.
An update, by default cuda orders GPUs differently (heuristically aiming for 0=best, see https://stackoverflow.com/questions/13781738/how-does-cuda-assign-device-ids-to-gpus ) than nvidia-smi orders them (by pci bus id). To fix this, if you set flags like CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=3
, for example, then it will make turboseti use the same GPU 3 that nvidia-smi reports as being GPU 3.
Fixed in PR #260.
FindDoppler__init__ (find_doppler.py) instantiation of class DATAHandle (data_handler.py) neglected to pass the gpu_id. So, it was defaulted to 0.