microsoft/DeepSpeed

Error building cpu_adam

arijitthegame opened this issue · 2 comments

Hi,

I apologize if it is a duplicate issue. I just pip installed deepspeed with pytorch 1.11. But I am still having issues with cpu_adam.

python -c "import deepspeed; deepspeed.ops.op_builder.CPUAdamBuilder().load() "
Installed CUDA version 10.0 does not match the version torch was compiled with 10.2 but since the APIs are compatible, accepting this combination
Using /home/ubuntu/.cache/torch_extensions/py38_cu102 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py38_cu102/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -c /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
FAILED: cpu_adam.o 
c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -c /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
In file included from /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include/torch/csrc/Device.h:3:0,
                 from /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
                 from /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
                 from /home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:5:
/home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/include/torch/csrc/python_headers.h:10:10: fatal error: Python.h: No such file or directory
 #include <Python.h>
          ^~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1740, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 470, in load
    return self.jit_load(verbose)
  File "/home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 512, in jit_load
    op_module = load(
  File "/home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1144, in load
    return _jit_compile(
  File "/home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1357, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1469, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/ubuntu/nlp_prompting/env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'

What is the easiest way to solve it?

Hi @arijitthegame, thanks for reporting your issue. In this case I see the error is related to not being able to find Python.h. I think you'll want to make sure you have python-dev installed. DeepSpeed compiles several custom cuda/cpp kernels which have python bindings.

Can you try installing python-dev? Examples: https://stackoverflow.com/questions/21530577/fatal-error-python-h-no-such-file-or-directory

Thank you so much for your reply. When I try to install python-dev, I get
python3-dev is already the newest version (3.6.7-1~18.04).

EDIT: I just needed to install the correct python3.x-dev and it works now. Thank you so much!!