Option for running on CPU
jakelawcheukwun opened this issue ยท 6 comments
When we are running api/run_example.py
, encountered the following error:
(myenv) ubuntu@ip-172-31-33-255:~/MIMAMO-Net$ python api/run_example.py
Traceback (most recent call last):
File "/home/ubuntu/MIMAMO-Net/api/resnet50_extractor.py", line 37, in __init__
self.model = self.model.to(device)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 443, in to
return self._apply(convert)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 203, in _apply
module._apply(fn)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 225, in _apply
param_applied = fn(param)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 441, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 149, in _lazy_init
_check_driver()
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 51, in _check_driver
raise AssertionError("""
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "api/run_example.py", line 7, in <module>
tester = Tester(model_weight_path, batch_size=64, workers=8, quiet=True)
File "/home/ubuntu/MIMAMO-Net/api/tester.py", line 40, in __init__
self.resnet50_extractor = Resnet50_Extractor(benchmark_dir, model_name, feature_layer)
File "/home/ubuntu/MIMAMO-Net/api/resnet50_extractor.py", line 39, in __init__
torch.cuda.set_device(0)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 245, in set_device
torch._C._cuda_setDevice(device)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 149, in _lazy_init
_check_driver()
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 51, in _check_driver
raise AssertionError("""
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
It looks like torch.cuda.set_device(0)
is not enough to switch the device to CPU. We need the option to run the code in CPU only instances, as GPU instances are too expensive to be deployed in development environment.
Currently this api did not support CPU, I will fix it soon
@wtomin I figured out a temporary solution: just removing the lines https://github.com/wtomin/MIMAMO-Net/blob/dev/apiv0/api/resnet50_extractor.py#L39 and https://github.com/wtomin/MIMAMO-Net/blob/dev/apiv0/api/resnet50_extractor.py#L40
That will notify pytorch that there is no GPU and fall back to CPU.
Then we encountered this error (seems like this is the thing we have to fix):
(myenv) ubuntu@ip-172-31-33-255:~/MIMAMO-Net$ python api/run_example.py
No CUDA devices found, falling back to CPU
Traceback (most recent call last):
File "api/run_example.py", line 7, in <module>
tester = Tester(model_weight_path, batch_size=64, workers=8, quiet=True)
File "/home/ubuntu/MIMAMO-Net/api/tester.py", line 45, in __init__
checkpoint = torch.load(model_path)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 773, in _legacy_load
result = unpickler.load()
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 729, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 178, in default_restore_location
result = fn(storage, location)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 154, in _cuda_deserialize
device = validate_cuda_device(location)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 138, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
@chan-kh do you have time to test out the latest commits on https://github.com/wtomin/MIMAMO-Net/tree/dev/apiv0 today?
Let me know if you encounter problems with git operations (merge the latest commits to local repo on Ubuntu)
Hi @wtomin , I am still encountering some CPU related errors in our latest commit.
`No CUDA devices found, falling back to CPU
No CUDA devices found, falling back to CPU
No CUDA devices found, falling back to CPU
Traceback (most recent call last):
File "run_example.py", line 7, in <module>
tester = Tester(model_weight_path, batch_size=64, workers=8, quiet=True)
File "/home/ubuntu/MIMAMO-Net/api/tester.py", line 47, in __init__
checkpoint = torch.load(model_path)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 773, in _legacy_load
result = unpickler.load()
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 729, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 178, in default_restore_location
result = fn(storage, location)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 154, in _cuda_deserialize
device = validate_cuda_device(location)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 138, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.`
I changed the
checkpoint = torch.load(model_path)
to
checkpoint = torch.load(model_path, map_location=device)
.
I think it should be good to work on CPU now.
Noting down that the latest run_example.py output with batch_size = 16 and num_workers = 0
is the followings:(Be aware that the valence and arousal value for some outputs are different from from the values in ReadMe by 0.000001)
`(myenv) ubuntu@ip-172-31-34-134:~/MIMAMO-Net/api$ python run_example.py
No CUDA devices found, falling back to CPU
No CUDA devices found, falling back to CPU
No CUDA devices found, falling back to CPU
load checkpoint from models/model_weights.pth.tar, epoch:1
output dir exists: examples/utterance_1. Video processing skipped.
0%| | 0/20 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1587428091666/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of add is deprecated:
add(Tensor input, Number alpha, Tensor other, *, Tensor out)
Consider using one of the following signatures instead:
add(Tensor input, Tensor other, *, Number alpha, Tensor out)
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 20/20 [01:18<00:00, 3.95s/it]
Prediction takes 109.3640 seconds for 309 frames, average 0.3539 seconds for one frame.
utterance_1 predictions
valence arousal
0 0.573943 0.623365
1 0.563206 0.647939
2 0.539191 0.648681
3 0.524442 0.691737
4 0.380667 0.585094
.. ... ...
304 -0.128373 0.573663
305 -0.200220 0.520262
306 -0.083072 0.392294
307 -0.211972 0.374694
308 -0.290508 0.416637
[309 rows x 2 columns]`