CUDA Error

Question

CUDA Error

Ronneesley opened this issue 5 years ago · 2 comments

When I ran python3 main.py.
I got the following message even after run all installations commands.

CUDA 1
File name prefix GraphRNN_RNN_grid_4_128_
graph_validate_len 199.5
graph_test_len 215.0
total graph num: 100, training set: 80
max number node: 361
max/min number edge: 684; 180
max previous node: 40
train and test graphs saved at:  ./graphs/GraphRNN_RNN_grid_4_128_test_0.dat
PERSONAL_PATH/GraphRNN/model.py:299: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  nn.init.xavier_uniform(param,gain=nn.init.calculate_gain('sigmoid'))
PERSONAL_PATH/GraphRNN/model.py:297: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
  nn.init.constant(param, 0.25)
PERSONAL_PATH/GraphRNN/model.py:302: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu'))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/THCGeneral.cpp line=51 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
  File "main.py", line 128, in <module>
    has_output=True, output_size=args.hidden_size_rnn_output).cuda()
  File "/home/ronneesley/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 265, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/ronneesley/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 193, in _apply
    module._apply(fn)
  File "/home/ronneesley/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 199, in _apply
    param.data = fn(param.data)
  File "/home/ronneesley/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 265, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/ronneesley/anaconda3/lib/python3.7/site-packages/torch/cuda/__init__.py", line 163, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/THCGeneral.cpp:51

Answer 1 · 2019-06-12T19:55:41.000Z

Hello again,

I changed 'self.cuda = 1' to 'self.cuda = 0' in args.py.
But I got other error:

CUDA 0
File name prefix GraphRNN_RNN_grid_4_128_
graph_validate_len 199.5
graph_test_len 215.0
total graph num: 100, training set: 80
max number node: 361
max/min number edge: 684; 180
max previous node: 40
train and test graphs saved at:  ./graphs/GraphRNN_RNN_grid_4_128_test_0.dat
PERSONAL_PATH/GraphRNN/model.py:299: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  nn.init.xavier_uniform(param,gain=nn.init.calculate_gain('sigmoid'))
PERSONAL_PATH/GraphRNN/model.py:297: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
  nn.init.constant(param, 0.25)
PERSONAL_PATH/GraphRNN/model.py:302: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu'))
/home/ronneesley/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:1386: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
  File "main.py", line 134, in <module>
    train(args, dataset_loader, rnn, output)
  File "PERSONAL_PATH/GraphRNN/train.py", line 682, in train
    scheduler_rnn, scheduler_output)
  File "PERSONAL_PATH/GraphRNN/train.py", line 516, in train_rnn_epoch
    log_value('loss_'+args.fname, loss.data[0], epoch*args.batch_ratio+batch_idx)
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

Answer 2 · 2019-08-13T13:52:52.000Z

Hello again,

I changed 'self.cuda = 1' to 'self.cuda = 0' in args.py.
But I got other error:

CUDA 0
File name prefix GraphRNN_RNN_grid_4_128_
graph_validate_len 199.5
graph_test_len 215.0
total graph num: 100, training set: 80
max number node: 361
max/min number edge: 684; 180
max previous node: 40
train and test graphs saved at:  ./graphs/GraphRNN_RNN_grid_4_128_test_0.dat
PERSONAL_PATH/GraphRNN/model.py:299: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  nn.init.xavier_uniform(param,gain=nn.init.calculate_gain('sigmoid'))
PERSONAL_PATH/GraphRNN/model.py:297: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
  nn.init.constant(param, 0.25)
PERSONAL_PATH/GraphRNN/model.py:302: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu'))
/home/ronneesley/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:1386: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
  File "main.py", line 134, in <module>
    train(args, dataset_loader, rnn, output)
  File "PERSONAL_PATH/GraphRNN/train.py", line 682, in train
    scheduler_rnn, scheduler_output)
  File "PERSONAL_PATH/GraphRNN/train.py", line 516, in train_rnn_epoch
    log_value('loss_'+args.fname, loss.data[0], epoch*args.batch_ratio+batch_idx)
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

I believe it is a version problem, you can change the loss.data[0] to loss.item(), and then the test code can run.