Option for running on CPU

Question

Option for running on CPU

jakelawcheukwun opened this issue 4 years ago · 6 comments

When we are running api/run_example.py, encountered the following error:

(myenv) ubuntu@ip-172-31-33-255:~/MIMAMO-Net$ python api/run_example.py 
Traceback (most recent call last):
  File "/home/ubuntu/MIMAMO-Net/api/resnet50_extractor.py", line 37, in __init__
    self.model = self.model.to(device)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 443, in to
    return self._apply(convert)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 203, in _apply
    module._apply(fn)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 225, in _apply
    param_applied = fn(param)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 441, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 149, in _lazy_init
    _check_driver()
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 51, in _check_driver
    raise AssertionError("""
AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "api/run_example.py", line 7, in <module>
    tester = Tester(model_weight_path, batch_size=64, workers=8, quiet=True)
  File "/home/ubuntu/MIMAMO-Net/api/tester.py", line 40, in __init__
    self.resnet50_extractor =  Resnet50_Extractor(benchmark_dir, model_name, feature_layer)
  File "/home/ubuntu/MIMAMO-Net/api/resnet50_extractor.py", line 39, in __init__
    torch.cuda.set_device(0)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 245, in set_device
    torch._C._cuda_setDevice(device)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 149, in _lazy_init
    _check_driver()
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 51, in _check_driver
    raise AssertionError("""
AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

It looks like torch.cuda.set_device(0) is not enough to switch the device to CPU. We need the option to run the code in CPU only instances, as GPU instances are too expensive to be deployed in development environment.

Answer 1 · 2020-05-17T14:18:16.000Z

Currently this api did not support CPU, I will fix it soon

Answer 2 · 2020-05-17T14:26:06.000Z

@wtomin I figured out a temporary solution: just removing the lines https://github.com/wtomin/MIMAMO-Net/blob/dev/apiv0/api/resnet50_extractor.py#L39 and https://github.com/wtomin/MIMAMO-Net/blob/dev/apiv0/api/resnet50_extractor.py#L40
That will notify pytorch that there is no GPU and fall back to CPU.

Then we encountered this error (seems like this is the thing we have to fix):

(myenv) ubuntu@ip-172-31-33-255:~/MIMAMO-Net$ python api/run_example.py 
No CUDA devices found, falling back to CPU
Traceback (most recent call last):
  File "api/run_example.py", line 7, in <module>
    tester = Tester(model_weight_path, batch_size=64, workers=8, quiet=True)
  File "/home/ubuntu/MIMAMO-Net/api/tester.py", line 45, in __init__
    checkpoint = torch.load(model_path)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 593, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 773, in _legacy_load
    result = unpickler.load()
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 729, in persistent_load
    deserialized_objects[root_key] = restore_location(obj, location)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 178, in default_restore_location
    result = fn(storage, location)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 154, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 138, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Answer 3 · 2020-05-18T07:39:23.000Z

@chan-kh do you have time to test out the latest commits on https://github.com/wtomin/MIMAMO-Net/tree/dev/apiv0 today?

Let me know if you encounter problems with git operations (merge the latest commits to local repo on Ubuntu)

Answer 4 · 2020-05-20T11:39:41.000Z

Hi @wtomin , I am still encountering some CPU related errors in our latest commit.

`No CUDA devices found, falling back to CPU
No CUDA devices found, falling back to CPU
No CUDA devices found, falling back to CPU
Traceback (most recent call last):
  File "run_example.py", line 7, in <module>
    tester = Tester(model_weight_path, batch_size=64, workers=8, quiet=True)
  File "/home/ubuntu/MIMAMO-Net/api/tester.py", line 47, in __init__
    checkpoint = torch.load(model_path)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 593, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 773, in _legacy_load
    result = unpickler.load()
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 729, in persistent_load
    deserialized_objects[root_key] = restore_location(obj, location)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 178, in default_restore_location
    result = fn(storage, location)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 154, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/serialization.py", line 138, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.`

Answer 5 · 2020-05-20T11:51:06.000Z

I changed the
checkpoint = torch.load(model_path)
to
checkpoint = torch.load(model_path, map_location=device).

I think it should be good to work on CPU now.

Answer 6 · 2020-05-31T14:03:04.000Z

Noting down that the latest run_example.py output with batch_size = 16 and num_workers = 0 is the followings:(Be aware that the valence and arousal value for some outputs are different from from the values in ReadMe by 0.000001)

`(myenv) ubuntu@ip-172-31-34-134:~/MIMAMO-Net/api$ python run_example.py 
No CUDA devices found, falling back to CPU
No CUDA devices found, falling back to CPU
No CUDA devices found, falling back to CPU
load checkpoint from models/model_weights.pth.tar, epoch:1
output dir exists: examples/utterance_1. Video processing skipped.
  0%|                                                                  | 0/20 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1587428091666/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of add is deprecated:
	add(Tensor input, Number alpha, Tensor other, *, Tensor out)
Consider using one of the following signatures instead:
	add(Tensor input, Tensor other, *, Number alpha, Tensor out)
100%|█████████████████████████████████████████████████████████| 20/20 [01:18<00:00,  3.95s/it]
Prediction takes 109.3640 seconds for 309 frames, average 0.3539 seconds for one frame.
utterance_1 predictions
      valence   arousal
0    0.573943  0.623365
1    0.563206  0.647939
2    0.539191  0.648681
3    0.524442  0.691737
4    0.380667  0.585094
..        ...       ...
304 -0.128373  0.573663
305 -0.200220  0.520262
306 -0.083072  0.392294
307 -0.211972  0.374694
308 -0.290508  0.416637

[309 rows x 2 columns]`