ICDAR2015 training RuntimeError

Question

ICDAR2015 training RuntimeError

jwnsu opened this issue 5 years ago · 5 comments

Got following error, training with 1 GPU (ubuntu 16.04, pytorch 1.1/cuda10, 1080ti):

File "./train.py", line 93, in <module>
    main(config, args.resume)
  File "./train.py", line 60, in main
    trainer.train()
  File "/home/dsu/ai/fots/base/base_trainer.py", line 79, in train
    result = self._train_epoch(epoch)
  File "/home/dsu/ai/fots/trainer/trainer.py", line 90, in _train_epoch
    training_mask)
  File "/home/dsu/p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dsu/ai/fots/model/loss.py", line 90, in forward
    recognition_loss = self.recognition_loss(y_true_recog, y_pred_recog)
  File "/home/dsu/p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dsu/ai/fots/model/loss.py", line 61, in forward
    loss = self.ctc_loss(pred[0], gt[0], pred[1], gt[1])
  File "/home/dsu/p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dsu/p36/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 1332, in forward
    self.zero_infinity)
  File "/home/dsu/p36/lib/python3.6/site-packages/torch/nn/functional.py", line 1813, in ctc_loss
    zero_infinity)
RuntimeError: Tensor for argument #2 'targets' is on CPU, but expected it to be on GPU (while checking arguments for ctc_loss_gpu)

Has anyone encountered this error? Thanks.

CPU training seems to work fine (but very slow).

ps: multiple-gpu training encountered a different error:

File "./train.py", line 93, in <module>
    main(config, args.resume)
  File "./train.py", line 60, in main
    trainer.train()
  File "/home/dsu/ai/fots/base/base_trainer.py", line 79, in train
    result = self._train_epoch(epoch)
  File "/home/dsu/ai/fots/trainer/trainer.py", line 74, in _train_epoch
    mapping)
  File "/home/dsu/p36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 142, in forward
    for t in chain(self.module.parameters(), self.module.buffers()):
AttributeError: 'FOTSModel' object has no attribute 'buffers'

Answer 1 · 2019-06-10T03:29:58.000Z

@jwnsu the first error,i think you should check your ground truth,have it load in gpu? have you use your_cuda_gt=your_gt.cuda()?
the second one,i will reply later after confirmation

Answer 2 · 2019-06-10T04:39:43.000Z

have tried moving gt to cuda, got following error:

File "/home/dsu/ai/fots/model/loss.py", line 61, in forward
    gt = gt.cuda()
AttributeError: 'tuple' object has no attribute 'cuda'

Answer 3 · 2019-06-10T06:31:16.000Z

@jwnsu
i think you need to read the error message,gt is a tuple,it contains many fields,you only just set the specific field value to cuda..

Answer 4 · 2019-06-16T22:20:09.000Z

thx for response, it now works fine.

Answer 5 · 2019-10-11T07:41:55.000Z

@jwnsu How did you solve the second err?