Errors with non-CUDA machine
Opened this issue · 1 comments
Discussed in #81
I am having a similar problem to the one described in the post quoted below, even though it appears that MapClassifier.py has been updated to incorporate the fix that the other user described.
When I try to use "Train Your Network", I get this error:
Traceback (most recent call last):
File "TagLab.py", line 4139, in trainNewNetwork
dataset_train_info, train_loss_values, val_loss_values = training.trainingNetwork(images_dir_train, labels_dir_train,
File "/Users/eln/TagLab/models/training.py", line 297, in trainingNetwork
state = torch.load("models/deeplab-resnet.pth.tar")
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 1040, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 1268, in _legacy_load
result = unpickler.load()
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 1205, in persistent_load
wrap_storage=restore_location(obj, location),
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 391, in default_restore_location
result = fn(storage, location)
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 266, in _cuda_deserialize
device = validate_cuda_device(location)
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 250, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
I looked at source/MapClassifier.py since it was mentioned in the previous discussion, and in lines 98-101 it looks like it should use torch.load with "cpu" since torch.cuda.is_available() is False, so this does not appear to be the same problem that the previous user ran into and fixed.
The problem appears to be arising in training.py , but I haven't figured out what it is yet. Any assistance would be appreciated!
Originally posted by andieich January 26, 2023
Hi,
I successfully installed TagLab on a Windows computer, but had some issues. I cannot use the GPU since it is made by Intel.
I therefore tried to install the CPU version of torch and tochvision. When I use the install.py
script, I get this error:
append() takes exactly one argument (2 given)
This is caused by line 234. I commented out lines 232 - 236 and manually installed both packages with the following code:
pip install torch --extra-index-url https://download.pytorch.org/whl/cpu
pip install torchvision --extra-index-url https://download.pytorch.org/whl/cpu
Afterwards, I could install TagLab flawlessly.
However, when trying to run a auto segmentation, I got this error:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
So I did as proposed and changed in source/MapClassifier.py
in line 98:
classifier.load_state_dict(torch.load(network_name)
to
classifier.load_state_dict(torch.load(network_name, map_location=torch.device('cpu')))
Now, everything works fine. Maybe that's something to consider for the next TagLab version.
Thanks for this amazing software!
I fixed the issue by editing both training.py and losses.py to use the cpu-version of Torch. This arose in multiple places in both scripts:
In losses.py:
Line 28 add: `
USE_CUDA = torch.cuda.is_available()
if USE_CUDA:
device = torch.device("cuda")
else:
device = torch.device("cpu")
net.to(device)`
Lines 40 and 62 change:
dist_maps_tensor = dist_maps_tensor.to(device='cuda:0')
to dist_maps_tensor = dist_maps_tensor.to(device)
in surface_loss function add:
`
USE_CUDA = torch.cuda.is_available()
if USE_CUDA:
device = torch.device("cuda")
else:
device = torch.device("cpu")`
Line 80 change one_hot = one_hot.to('cuda:0')
to one_hot = one_hot.to('cpu')
In training.py :
Line 103, add:
`
else:
device = torch.device("cpu")
net.to(device)
torch.cpu.synchronize()`
Line 297, change state = torch.load("models/deeplab-resnet.pth.tar")
to state = torch.load("models/deeplab-resnet.pth.tar", map_location=torch.device("cpu"))
Line 333, change class_weights = torch.FloatTensor(weights).cuda()
to class_weights = torch.FloatTensor(weights).cpu()
Line 445, remove torch.cuda.empty_cache()
Line 479, change net.load_state_dict(torch.load(network_filename))
to net.load_state_dict(torch.load(network_filename, map_location=torch.device("cpu")))