Errors with non-CUDA machine

Question

Errors with non-CUDA machine

Opened this issue 2 months ago · 1 comments

Discussed in #81

I am having a similar problem to the one described in the post quoted below, even though it appears that MapClassifier.py has been updated to incorporate the fix that the other user described.

When I try to use "Train Your Network", I get this error:

Traceback (most recent call last):
File "TagLab.py", line 4139, in trainNewNetwork
dataset_train_info, train_loss_values, val_loss_values = training.trainingNetwork(images_dir_train, labels_dir_train,
File "/Users/eln/TagLab/models/training.py", line 297, in trainingNetwork
state = torch.load("models/deeplab-resnet.pth.tar")
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 1040, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 1268, in _legacy_load
result = unpickler.load()
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 1205, in persistent_load
wrap_storage=restore_location(obj, location),
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 391, in default_restore_location
result = fn(storage, location)
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 266, in _cuda_deserialize
device = validate_cuda_device(location)
File "/Users/eln/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 250, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I looked at source/MapClassifier.py since it was mentioned in the previous discussion, and in lines 98-101 it looks like it should use torch.load with "cpu" since torch.cuda.is_available() is False, so this does not appear to be the same problem that the previous user ran into and fixed.

The problem appears to be arising in training.py , but I haven't figured out what it is yet. Any assistance would be appreciated!

^{Originally posted by andieich January 26, 2023}
Hi,
I successfully installed TagLab on a Windows computer, but had some issues. I cannot use the GPU since it is made by Intel.
I therefore tried to install the CPU version of torch and tochvision. When I use the install.py script, I get this error:

append() takes exactly one argument (2 given)

This is caused by line 234. I commented out lines 232 - 236 and manually installed both packages with the following code:

pip install torch --extra-index-url https://download.pytorch.org/whl/cpu

pip install torchvision --extra-index-url https://download.pytorch.org/whl/cpu

Afterwards, I could install TagLab flawlessly.

However, when trying to run a auto segmentation, I got this error:

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

So I did as proposed and changed in source/MapClassifier.py in line 98:

classifier.load_state_dict(torch.load(network_name)

to

classifier.load_state_dict(torch.load(network_name, map_location=torch.device('cpu')))

Now, everything works fine. Maybe that's something to consider for the next TagLab version.
Thanks for this amazing software!

Answer 1 · 2024-04-17T21:24:05.000Z

I fixed the issue by editing both training.py and losses.py to use the cpu-version of Torch. This arose in multiple places in both scripts:

In losses.py:
Line 28 add: `

USE_CUDA = torch.cuda.is_available()
if USE_CUDA:

    device = torch.device("cuda")

else:

    device = torch.device("cpu")
   net.to(device)`

Lines 40 and 62 change:
dist_maps_tensor = dist_maps_tensor.to(device='cuda:0') to dist_maps_tensor = dist_maps_tensor.to(device)

in surface_loss function add:
`

USE_CUDA = torch.cuda.is_available()
if USE_CUDA:
    device = torch.device("cuda")

else:

    device = torch.device("cpu")`

Line 80 change one_hot = one_hot.to('cuda:0') to one_hot = one_hot.to('cpu')

In training.py :
Line 103, add:
`

else:
    device = torch.device("cpu")

    net.to(device)

    torch.cpu.synchronize()`

Line 297, change state = torch.load("models/deeplab-resnet.pth.tar") to state = torch.load("models/deeplab-resnet.pth.tar", map_location=torch.device("cpu"))

Line 333, change class_weights = torch.FloatTensor(weights).cuda() to class_weights = torch.FloatTensor(weights).cpu()

Line 445, remove torch.cuda.empty_cache()

Line 479, change net.load_state_dict(torch.load(network_filename)) to net.load_state_dict(torch.load(network_filename, map_location=torch.device("cpu")))