run_expt.py: --device argument doesn't set the device
vvolhejn opened this issue · 0 comments
Hey, I'm running Wilds on a p2.8xlarge
AWS EC2 instance with 8 K80 GPUs. I noticed that when I try to run run_expt.py
and use the --device
argument to divide the jobs I'm trying to run between the GPUs, they all end up running on GPU 0. I verified this by the memory usage in nvidia-smi
as well as printing the device used by torch using torch.cuda.current_device()
. My guess is that the CUDA_VISIBLE_DEVICES
environment variable, set here, is set too late and PyTorch just defaults to device 0.
I've worked around this by setting the CUDA_VISIBLE_DEVICES
variable manually, before running the script. I just thought I'd let you know I encountered this issue.
Really appreciate the project by the way! Being able to access multiple datasets for domain generalization with the same interface is really useful, and I managed to use run_expt
pretty easily to run my own experiments.