Unrecognized tensor type ID: AutogradCUDA

Question

Unrecognized tensor type ID: AutogradCUDA

jkkishore1999 opened this issue 4 years ago · 6 comments

jkkishore1999 commented 4 years ago

When I was running the following script for fine-tuning on refcoco,
! bash ./scripts/nondist_run.sh refcoco/train_end2end.py 'cfgs/refcoco/base_gt_boxes_4x16G.yaml' refcoco_base_gt_ckpt

I enountered the following error.

[Partial Load] non matched keys: ['object_mask_word_embedding.weight', 'aux_text_visual_embedding.weight', 'vlbert.mlm_head.predictions.bias', 'vlbert.mlm_head.predictions.transform.dense.weight', 'vlbert.mlm_head.predictions.transform.dense.bias', 'vlbert.mlm_head.predictions.transform.LayerNorm.weight', 'vlbert.mlm_head.predictions.transform.LayerNorm.bias', 'vlbert.mlm_head.predictions.decoder.weight', 'vlbert.mvrc_head.region_cls_pred.weight', 'vlbert.mvrc_head.region_cls_pred.bias']
[Partial Load] non pretrain keys: ['final_mlp.2.weight', 'final_mlp.2.bias']
PROGRESS: 0.00%
/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/fast_rcnn.py:136: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
box_inds = box_mask.nonzero()
Traceback (most recent call last):
File "refcoco/train_end2end.py", line 60, in
main()
File "refcoco/train_end2end.py", line 54, in main
rank, model = train_net(args, config)
File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../refcoco/function/train.py", line 323, in train_net
gradient_accumulate_steps=config.TRAIN.GRAD_ACCUMULATE_STEPS)
File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/trainer.py", line 115, in train
outputs, loss = net(*batch)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/module.py", line 22, in forward
return self.train_forward(*inputs, **kwargs)
File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../refcoco/modules/resnet_vlbert_for_refcoco.py", line 96, in train_forward
segms=None)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/fast_rcnn.py", line 149, in forward
roi_align_res = self.roi_align(img_feats['body4'], rois).type(images.dtype)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/lib/roi_pooling/roi_align.py", line 69, in forward
input.float(), rois.float(), self.output_size, self.spatial_scale, self.sampling_ratio
File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/lib/roi_pooling/roi_align.py", line 20, in forward
input, rois, spatial_scale, output_size[0], output_size[1], sampling_ratio
RuntimeError: Unrecognized tensor type ID: AutogradCUDA

I am running on google colab, using pytorch version 1.7.0 , torchvision 0.8.1 and cuda 10.1. Same error is coming with cuda 9.2 also. When I use pytorch 1.1.0 as mentioned in the readme, many errors related to modules in torchvision are coming. Please help I am in urgent need of this to complete my project

Answer 1 · 2020-11-19T02:31:48.000Z

Is it solved？
I also encountered this problem

Answer 2 · 2020-11-19T02:37:57.000Z

I could not solve it yet. Please let me know if you have any solution

Answer 3 · 2020-11-29T08:30:13.000Z

@jkkishore1999 @G-Apple1 Could you provide more information about your environment, especially the version of GCC, CUDA and pytorch?

Answer 4 · 2020-11-29T09:23:12.000Z

@jkkishore1999 @G-Apple1 Could you provide more information about your environment, especially the version of GCC, CUDA and pytorch?

I am running on google colab, using pytorch version 1.7.0 , torchvision 0.8.1 and cuda 10.1. Same error is coming with cuda 9.2 also. When I use pytorch 1.1.0 as mentioned in the readme, many errors related to modules in torchvision are coming. Please help I am in urgent need of this to complete my project

Answer 5 · 2020-11-29T09:25:50.000Z

@jkkishore1999 @G-Apple1 Could you provide more information about your environment, especially the version of GCC, CUDA and pytorch?

I am running on google colab, using pytorch version 1.7.0 , torchvision 0.8.1 and cuda 10.1. Same error is coming with cuda 9.2 also. When I use pytorch 1.1.0 as mentioned in the readme, many errors related to modules in torchvision are coming. Please help I am in urgent need of this to complete my project

me too

Answer 6 · 2020-12-09T02:19:49.000Z

@jkkishore1999 @G-Apple1 I havn't test the code on torch 1.7.0. I think you need to use torch 1.1.0. And you should use the corresponding version of torchvision (e.g., 0.3.0, see this page for the correspondence).