eval.py 报错：RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.

Question

eval.py 报错：RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.

stealth0414 opened this issue 2 years ago · 7 comments

用自己的数据集训练完成后尝试运行eval.py，发现报错
Traceback (most recent call last):
File "eval.py", line 193, in
main()
File "eval.py", line 79, in main
Eval(experiment, experiment_args, cmd=args, verbose=args['verbose']).eval(args['visualize'])
File "eval.py", line 164, in eval
model = self.init_model()
File "eval.py", line 107, in init_model
model = self.structure.builder.build(self.device)
File "/hy-tmp/DB-yanhua/structure/builder.py", line 24, in build
model = Model(self.model_args, device,
File "/hy-tmp/DB-yanhua/structure/model.py", line 37, in init
self.model = BasicModel(args)
File "/hy-tmp/DB-yanhua/structure/model.py", line 15, in init
self.backbone = getattr(backbones, args['backbone'])(**args.get('backbone_args', {}))
File "/hy-tmp/DB-yanhua/backbones/resnet.py", line 310, in deformable_resnet50
model.load_state_dict(model_zoo.load_url(
File "/usr/local/lib/python3.8/dist-packages/torch/hub.py", line 731, in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 905, in legacy_load
return legacy_load(f)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 841, in legacy_load
tensor = torch.tensor([], dtype=storage.dtype).set(
RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.

def deformable_resnet50(pretrained=True, **kwargs):
"""Constructs a ResNet-50 model with deformable conv.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(Bottleneck, [3, 4, 6, 3],
dcn=dict(modulated=True,
deformable_groups=1,
fallback_on_stride=False),
stage_with_dcn=[False, True, True, True],
**kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(
model_urls['resnet50']), strict=False)
return model

Why do I still need to load the resnet50 pre-training weight after training? Do you have friends who have solved it？
Also try to comment out if pretrained, the metrics are all 0,
[INFO] [2023-03-17 18:04:29,041] precision : 0.000000 (44)
[INFO] [2023-03-17 18:04:29,042] recall : 0.000000 (44)
[INFO] [2023-03-17 18:04:29,042] fmeasure : 0.000000 (1)
thanks

Answer 1 · 2023-04-17T06:29:36.000Z

我遇到了同样的问题，请问您解决了吗

Answer 2 · 2023-04-17T06:58:23.000Z

忘记了，但你可以试试在作者的预训练模型上进行训练

Answer 3 · 2023-04-17T09:23:35.000Z

可能是pytorch的问题，我在linux上11.3版本也出现这个问题，但是自己的电脑10.2就没问题

Answer 4 · 2023-04-17T12:15:11.000Z

好的，谢谢您了

Answer 5 · 2024-08-03T08:51:59.000Z

请问您解决了吗，我也遇到一样的问题

Answer 6 · 2024-09-25T07:23:44.000Z

我在这里找到了解决方法：
https://stackoverflow.com/questions/71643035/runtimeerror-attempted-to-set-the-storage-of-a-tensor-on-device-cuda0-to-a-s
大家可以参考下

Answer 7 · 2024-10-03T11:30:43.000Z

I have experienced this issue and this is how I resolved it:
The issue traces back to a script resnet.py to a line 46. During training and validation I have changed the line to:
pretrained_dict = model_zoo.load_url(url)
But it does not work for eval. During eval I change the line to:
pretrained_dict = model_zoo.load_url(url, map_location=device)
Have no idea how to solve it completley but it works fine for now.
Hope it helps!