MhLiao/DB

eval.py 报错:RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.

stealth0414 opened this issue · 7 comments

用自己的数据集训练完成后尝试运行eval.py,发现报错
Traceback (most recent call last):
File "eval.py", line 193, in
main()
File "eval.py", line 79, in main
Eval(experiment, experiment_args, cmd=args, verbose=args['verbose']).eval(args['visualize'])
File "eval.py", line 164, in eval
model = self.init_model()
File "eval.py", line 107, in init_model
model = self.structure.builder.build(self.device)
File "/hy-tmp/DB-yanhua/structure/builder.py", line 24, in build
model = Model(self.model_args, device,
File "/hy-tmp/DB-yanhua/structure/model.py", line 37, in init
self.model = BasicModel(args)
File "/hy-tmp/DB-yanhua/structure/model.py", line 15, in init
self.backbone = getattr(backbones, args['backbone'])(**args.get('backbone_args', {}))
File "/hy-tmp/DB-yanhua/backbones/resnet.py", line 310, in deformable_resnet50
model.load_state_dict(model_zoo.load_url(
File "/usr/local/lib/python3.8/dist-packages/torch/hub.py", line 731, in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 905, in legacy_load
return legacy_load(f)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 841, in legacy_load
tensor = torch.tensor([], dtype=storage.dtype).set
(
RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.

def deformable_resnet50(pretrained=True, **kwargs):
"""Constructs a ResNet-50 model with deformable conv.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(Bottleneck, [3, 4, 6, 3],
dcn=dict(modulated=True,
deformable_groups=1,
fallback_on_stride=False),
stage_with_dcn=[False, True, True, True],
**kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(
model_urls['resnet50']), strict=False)
return model

Why do I still need to load the resnet50 pre-training weight after training? Do you have friends who have solved it?
Also try to comment out if pretrained, the metrics are all 0,
[INFO] [2023-03-17 18:04:29,041] precision : 0.000000 (44)
[INFO] [2023-03-17 18:04:29,042] recall : 0.000000 (44)
[INFO] [2023-03-17 18:04:29,042] fmeasure : 0.000000 (1)
thanks

我遇到了同样的问题,请问您解决了吗

忘记了,但你可以试试在作者的预训练模型上进行训练

可能是pytorch的问题,我在linux上11.3版本也出现这个问题,但是自己的电脑10.2就没问题

好的,谢谢您了

请问您解决了吗,我也遇到一样的问题

I have experienced this issue and this is how I resolved it:
The issue traces back to a script resnet.py to a line 46. During training and validation I have changed the line to:
pretrained_dict = model_zoo.load_url(url)
But it does not work for eval. During eval I change the line to:
pretrained_dict = model_zoo.load_url(url, map_location=device)
Have no idea how to solve it completley but it works fine for now.
Hope it helps!