training on CPU

Question

training on CPU

coranholmes opened this issue 5 years ago · 1 comments

I am running your codes on MacOS and got the following errors. I search online and discover that in Macs there is no nvidia-smi command that comes with nvidia drivers. I try to comment this line pw.nvidia_memory_map(gpu_index = gpu_index) and now the codes can run. But I am not sure is it correct to do like that?

[2019-10-02 16:09:23,892] Epoch: 0                                              
[2019-10-02 16:09:23,892] It's recommended to set ``CUDA_DEVICE_ORDER``to be ``PCI_BUS_ID`` by ``export CUDA_DEVICE_ORDER=PCI_BUS_ID``;otherwise, it's not guaranteed that the gpu index frompytorch to be consistent the ``nvidia-smi`` results.
Traceback (most recent call last):
  File "train_partial_ner.py", line 124, in <module>
    pw.nvidia_memory_map(gpu_index = gpu_index)
  File "/Users/weiling.chen/anaconda2/envs/py3/lib/python3.7/site-packages/torch_scope/wrapper.py", line 483, in nvidia_memory_map
    return basic_wrapper.nvidia_memory_map(use_logger = use_logger, gpu_index = gpu_index)
  File "/Users/weiling.chen/anaconda2/envs/py3/lib/python3.7/site-packages/torch_scope/wrapper.py", line 190, in nvidia_memory_map
    '--format=csv,noheader'])
  File "/Users/weiling.chen/anaconda2/envs/py3/lib/python3.7/subprocess.py", line 395, in check_output
    **kwargs).stdout
  File "/Users/weiling.chen/anaconda2/envs/py3/lib/python3.7/subprocess.py", line 472, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/Users/weiling.chen/anaconda2/envs/py3/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/Users/weiling.chen/anaconda2/envs/py3/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi': 'nvidia-smi'
Done.

Answer 1 · 2019-10-20T15:04:16.000Z

Hi, we would recommend you to use GPU for training, which will be faster. As to CPU training, we haven't tested it yet, it requires several changes of the implementation. The change you made is necessary, that command is used to print and log the GPU memory usage.