[PyTroch] torch_cuda.set_device() and torch.manual_seed() hang
haifengl opened this issue ยท 5 comments
App will hang if call either of these functions.
In fact, these functions hang on multi-GPU box. They work fine on a single GPU box. I tried official libtorch build from pytorch.org and saw the behaviors. I suspect that the issue is not with PyTorch but with cuda-platform
package. cudart.cudaDeviceProp
method hangs on multi-GPU box too.
That basically just calls the CUDA functions through JNI, so I'm guessing that calling them directly from C++ also hangs, right?
Python wrapper of these methods work fine. I don't think that C++ functions would hang. I guess that we may have to do some extra initialization work on multi-GPU systems?
I guess not calling cuInit() could do that?
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__INITIALIZE.html
cuInit
does the magic. Thanks a lot! It is interesting why we don't need to call it in single GPU system.