Train error with no CUDA-capable device is detected
ksnzh opened this issue · 2 comments
ksnzh commented
I have build pytorch from source with CUDA 8 and cudnn 7 and installed pytroch-Encoding succesfully.
But when I try to train the model, it shows the following error.
➜ LightNet git:(master) ✗ python scripts/train_mobile.py
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
- Setting up DataLoader...
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
Found 2975 train images...
Found 500 val images...+++++++++++++++++++++++++++++++++++++++++++++++++++++++
- Setting up Model...
THCudaCheck FAIL file=/home/kenzhang/pytorch/aten/src/THC/THCGeneral.cpp line=70 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
File "scripts/train_mobile.py", line 372, in
train(train_args, data_path, save_path)
File "scripts/train_mobile.py", line 75, in train
model = torch.nn.DataParallel(model, device_ids=[0, 1]).cuda()
File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 230, in cuda
return self._apply(lambda t: t.cuda(device))
File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 152, in _apply
module._apply(fn)
File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 152, in _apply
module._apply(fn)
File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 152, in _apply
module._apply(fn)
File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 158, in _apply
param.data = fn(param.data)
File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 230, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /home/kenzhang/pytorch/aten/src/THC/THCGeneral.cpp:70
ksnzh commented
I'm so stupid.
This line https://github.com/ansleliu/LightNet/blob/master/scripts/train_mobile.py#L75 said that you use two gpu device while there is only one gpu in my machine.
Thanks.