Error when training

Question

Error when training

Opened this issue 6 years ago · 7 comments

When executing the command below:
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv128CosineClassifier

It prompts:

Exception KeyError: KeyError(<weakref at 0x7f619db132b8; to 'tqdm' at 0x7f619db23090>,) in <bound method tqdm.__del__ of
  0%|                                                                 | 0/2000 [00:00<?, ?it/s]> ignored
Traceback (most recent call last):
  File "train.py", line 110, in <module>
    algorithm.solve(dloader_train, dloader_test)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/Algorithm.py", line 286, in solve
    eval_stats = self.evaluate(data_loader_test)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/Algorithm.py", line 330, in evaluate
    eval_stats_this = self.evaluation_step(batch)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/FewShot.py", line 84, in evaluation_ste
p
    return self.process_batch(batch, do_train=False)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/FewShot.py", line 87, in process_batch
    process_type = self.set_tensors(batch)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/FewShot.py", line 60, in set_tensors
    nKnovel = 1 + labels_train.max() - self.nKbase
RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.LongTensor for argument #3 'other'

Environment:
Python 2.7
PyTorch 0.4 @ CUDA 9.1

Answer 1 · 2018-07-15T06:00:48.000Z

@xwjabc I met the same problem and I'm not familiar with pytorch. But change this line https://github.com/gidariss/FewShotWithoutForgetting/blob/master/algorithms/FewShot.py#L55 to self.nKbase = nKbase.squeeze()[0].cuda() fix the problem.

Answer 2 · 2018-07-15T06:44:31.000Z

@caiqi Thx! Will take a look. I am also a newbie to PyTorch and trying to trace the reason of that error.

Answer 3 · 2018-07-17T06:48:30.000Z

Got the reason. In PyTorch 0.4, x.squeeze()[0] will not return a scalar, but a tensor. It will cause several compatibility problems (e.g. nKbase errors, DAverageMeter errors). Will post a patch list later.

Answer 4 · 2018-10-23T15:22:01.000Z

@xwjabc I met possibly the same DAverageMeter error (AccuracyNovel is missing). Could you please tell me how to fix it?

Answer 5 · 2018-10-23T21:12:16.000Z

@jin-s13 Could you add some more details for the error information?

Answer 6 · 2019-01-09T15:32:09.000Z

@jin-s13 My suggestion; if you are still interested, is you should add .item() at the end of top1accuracy() function whenever you calculate Accuracies for Novel, Base or Both this will turn the loss_record into a scalar for the aforementioned accuracies

Answer 7 · 2019-11-11T09:40:51.000Z

Here is my solution:

#labels_train = self.tensors['labels_train']

nKnovel = 1 + labels_train.max() - self.nKbase

labels_train_1hot_size = list(labels_train.size()) + [nKnovel,]
labels_train_unsqueeze = labels_train.unsqueeze(dim=labels_train.dim())
self.tensors['labels_train_1hot'].resize_(labels_train_1hot_size).fill_(0).scatter_(
len(labels_train_1hot_size) - 1, (labels_train_unsqueeze - self.nKbase).cuda(), 1)