linksense/LightNet

Train error with no CUDA-capable device is detected

ksnzh opened this issue · 2 comments

ksnzh commented

I have build pytorch from source with CUDA 8 and cudnn 7 and installed pytroch-Encoding succesfully.
But when I try to train the model, it shows the following error.

➜ LightNet git:(master) ✗ python scripts/train_mobile.py

+++++++++++++++++++++++++++++++++++++++++++++++++++++++

  1. Setting up DataLoader...

+++++++++++++++++++++++++++++++++++++++++++++++++++++++

Found 2975 train images...
Found 500 val images...

+++++++++++++++++++++++++++++++++++++++++++++++++++++++

  1. Setting up Model...
    THCudaCheck FAIL file=/home/kenzhang/pytorch/aten/src/THC/THCGeneral.cpp line=70 error=38 : no CUDA-capable device is detected
    Traceback (most recent call last):
    File "scripts/train_mobile.py", line 372, in
    train(train_args, data_path, save_path)
    File "scripts/train_mobile.py", line 75, in train
    model = torch.nn.DataParallel(model, device_ids=[0, 1]).cuda()
    File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 230, in cuda
    return self._apply(lambda t: t.cuda(device))
    File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 152, in _apply
    module._apply(fn)
    File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 152, in _apply
    module._apply(fn)
    File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 152, in _apply
    module._apply(fn)
    File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 158, in _apply
    param.data = fn(param.data)
    File "/opt/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 230, in
    return self._apply(lambda t: t.cuda(device))
    RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /home/kenzhang/pytorch/aten/src/THC/THCGeneral.cpp:70
ksnzh commented

I'm so stupid.
This line https://github.com/ansleliu/LightNet/blob/master/scripts/train_mobile.py#L75 said that you use two gpu device while there is only one gpu in my machine.
Thanks.

@ksnzh ok, it doesn't matter ^_^