CERN/TIGRE

Complie fail when intall TIGRE in the server

GreameLee opened this issue · 8 comments

When I try to install TIGRE Python on the supercomputer the compiling failed and got this error:

(ss) exouser@sit-new:~/SiT/TIGRE/Python$ pip install .
Processing /home/exouser/SiT/TIGRE/Python
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      Traceback (most recent call last):
        File "/home/exouser/.conda/envs/ss/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/exouser/.conda/envs/ss/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/exouser/.conda/envs/ss/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-ahxu88yn/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-ahxu88yn/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-ahxu88yn/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 487, in run_setup
          super().run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-ahxu88yn/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 127, in <module>
        File "<string>", line 90, in locate_cuda
      OSError: CUDA_HOME or CUDA_PATH not set
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

OS: ubuntu 22.04
GCC: x86_64-linux-gnu-gcc-12
CUDA:12.2

I guess there is some problem with the setup.py and he available cuda version for the server is:

(ss) exouser@sit-new:~/SiT/TIGRE/Python$ module spider cuda

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  nvhpc/23.11:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Versions:
        nvhpc/23.11/nvhpc-byo-compiler
        nvhpc/23.11/nvhpc-hpcx-cuda11
        nvhpc/23.11/nvhpc-hpcx-cuda12
        nvhpc/23.11/nvhpc-hpcx
        nvhpc/23.11/nvhpc-nompi
        nvhpc/23.11/nvhpc-openmpi3
        nvhpc/23.11/nvhpc

nvidia-smi is good and torch.cuda.is_available is also good.The cuda has been installed

Hello, Ander, the original error is:

(ss) exouser@sit-new:~/SiT/TIGRE/Python$ pip install .
Processing /home/exouser/SiT/TIGRE/Python
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [21 lines of output]
      Traceback (most recent call last):
        File "/home/exouser/.conda/envs/ss/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/exouser/.conda/envs/ss/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/exouser/.conda/envs/ss/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-9xseo1re/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-9xseo1re/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-9xseo1re/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 487, in run_setup
          super().run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-9xseo1re/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 125, in <module>
        File "<string>", line 106, in locate_cuda
        File "<string>", line 60, in get_cuda_version
        File "/home/exouser/.conda/envs/ss/lib/python3.8/posixpath.py", line 76, in join
          a = os.fspath(a)
      TypeError: expected str, bytes or os.PathLike object, not NoneType
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Cuda when packaged with pytorch is not always the full cuda.

Can you do "which nvcc"? If there is no nvcc then you have not installed cuda. You can check the installation instructions for how to do it.

This server did not support "nvcc -V" commend

Apologies, I don't understand what you mean with that last comment. If the server has a NVIDIA GPU, then it supports nvcc. But maybe its not installed, which is what I am trying to figure out, to help.

nvcc is not installed. And I try to install it before by:
sudo apt install nvidia-cuda-toolkit
But it will change all the virtual environment and nividia-smi can not be used

So I create a new account to login the surpercomputer

CUDA is not a python package. When you install the runtime libraries with e.g. pytorch, it comes in a virtual enviroment, but raw CUDA compiler, i.e. nvcc can not be installed "in an enviroment" in the same way that gcc can not.

Please, instead of trying random things do follow the instructions in TIGRE to install CUDA. https://developer.nvidia.com/cuda-downloads

If you would have done so, we would not need to have this conversation :) Much easier for both of us!