bitsandbytes-foundation/bitsandbytes

cuda is available but import bnb error

ZeroneBo opened this issue · 2 comments

System Info

CentOS Linux release 7.8.2003 (Core)
NVIDIA A100-PCIE-40GB 1gpu
NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2

nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.6.68
nvidia-nvtx-cu12         12.1.105
torch                    2.4.1
triton                   3.0.0
trl                      0.9.6
transformers             4.44.2
bitsandbytes==0.44.0.dev0, also tried 0.39.0, etc.
Could not find the bitsandbytes CUDA binary at PosixPath('/public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/bitsandbytes-0.44.0.dev0-py3.10-linux-x86_64.egg/bitsandbytes/libbitsandbytes_cuda121.so')
Could not load bitsandbytes native library: /public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/bitsandbytes-0.44.0.dev0-py3.10-linux-x86_64.egg/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/bitsandbytes-0.44.0.dev0-py3.10-linux-x86_64.egg/bitsandbytes/cextension.py", line 104, in <module>
    lib = get_native_library()
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/bitsandbytes-0.44.0.dev0-py3.10-linux-x86_64.egg/bitsandbytes/cextension.py", line 91, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/bitsandbytes-0.44.0.dev0-py3.10-linux-x86_64.egg/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory

CUDA Setup failed despite CUDA being available. Please run the following command to get more information:

python -m bitsandbytes

Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

The directory listed in your path is found to be non-existent: /opt/gridview/slurm/lib64
The directory listed in your path is found to be non-existent: /usr/local/cuda/lib
The directory listed in your path is found to be non-existent: /opt/gridview/slurm/lib64
The directory listed in your path is found to be non-existent: /usr/local/cuda/lib
The directory listed in your path is found to be non-existent: /opt/gridview/slurm/lib64
The directory listed in your path is found to be non-existent: /opt/gridview/clusquota/man
The directory listed in your path is found to be non-existent: /opt/gridview/clusquota/man
The directory listed in your path is found to be non-existent: /opt/gridview/clusquota/man
The directory listed in your path is found to be non-existent: /public/home/sb/perl5/lib/perl5
The directory listed in your path is found to be non-existent: --install_base /public/home/sb/perl5
The directory listed in your path is found to be non-existent: /public/home/sb/.vscode-server/cli/servers/Stable-89de5a8d4d6205e5b11647eb6a74844ca23d2573/server/extensions/git/dist/askpass-main.js
The directory listed in your path is found to be non-existent: /opt/clusconf
The directory listed in your path is found to be non-existent: /run/user/2014/vscode-git-80b887459b.sock
The directory listed in your path is found to be non-existent: /run/user/2014/vscode-ipc-f1d5091b-075d-4c0b-8252-1bbfe64ebf5a.sock
The directory listed in your path is found to be non-existent: /public/home/sb/.vscode-server/cli/servers/Stable-89de5a8d4d6205e5b11647eb6a74844ca23d2573/server/bin/helpers/browser.sh
The directory listed in your path is found to be non-existent: /public/home/sb/.vscode-server/cli/servers/Stable-89de5a8d4d6205e5b11647eb6a74844ca23d2573/server/node
The directory listed in your path is found to be non-existent: /public/home/sb/.vscode-server/cli/servers/Stable-89de5a8d4d6205e5b11647eb6a74844ca23d2573/server/extensions/git/dist/askpass.sh
The directory listed in your path is found to be non-existent: INSTALL_BASE=/public/home/sb/perl5
Traceback (most recent call last):
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/bitsandbytes-0.44.0.dev0-py3.10-linux-x86_64.egg/bitsandbytes/diagnostics/main.py", line 66, in main
    sanity_check()
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/bitsandbytes-0.44.0.dev0-py3.10-linux-x86_64.egg/bitsandbytes/diagnostics/main.py", line 40, in sanity_check
    adam.step()
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/torch/optim/optimizer.py", line 484, in wrapper
    out = func(*args, **kwargs)
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/bitsandbytes-0.44.0.dev0-py3.10-linux-x86_64.egg/bitsandbytes/optim/optimizer.py", line 287, in step
    self.update_step(group, p, gindex, pindex)
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/bitsandbytes-0.44.0.dev0-py3.10-linux-x86_64.egg/bitsandbytes/optim/optimizer.py", line 500, in update_step
    F.optimizer_update_32bit(
  File "/public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/bitsandbytes-0.44.0.dev0-py3.10-linux-x86_64.egg/bitsandbytes/functional.py", line 1604, in optimizer_update_32bit
    optim_func = str2optimizer32bit[optimizer_name][0]
NameError: name 'str2optimizer32bit' is not defined
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(8, 0), cuda_version_string='121', cuda_version_tuple=(12, 1))
PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: (8, 0).
Library not found: /public/home/sb/anaconda3/envs/ft/lib/python3.10/site-packages/bitsandbytes-0.44.0.dev0-py3.10-linux-x86_64.egg/bitsandbytes/libbitsandbytes_cuda121.so. Maybe you need to compile it from source?
If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION`,
for example, `make CUDA_VERSION=113`.

The CUDA version for the compile might depend on your conda install, if using conda.
Inspect CUDA version via `conda list | grep cuda`.
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
Found duplicate CUDA runtime files (see below).

We select the PyTorch default CUDA runtime, which is 12.1,
but this might mismatch with the CUDA version that is needed for bitsandbytes.
To override this behavior set the `BNB_CUDA_VERSION=<version string, e.g. 122>` environmental variable.

For example, if you want to use the CUDA version 122,
    BNB_CUDA_VERSION=122 python ...

OR set the environmental variable in your .bashrc:
    export BNB_CUDA_VERSION=122

In the case of a manual override, make sure you set LD_LIBRARY_PATH, e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2,
* Found CUDA runtime at: /usr/local/cuda/lib64/libcudart.so
* Found CUDA runtime at: /usr/local/cuda/lib64/libcudart.so.12
* Found CUDA runtime at: /usr/local/cuda/lib64/libcudart.so.12.2.140
* Found CUDA runtime at: /usr/local/cuda/lib64/libcudart.so
* Found CUDA runtime at: /usr/local/cuda/lib64/libcudart.so.12
* Found CUDA runtime at: /usr/local/cuda/lib64/libcudart.so.12.2.140
* Found CUDA runtime at: /public/home/sb/anaconda3/envs/ft/lib/libcudart.so
* Found CUDA runtime at: /public/home/sb/anaconda3/envs/ft/lib/libcudart.so.12.6.68
* Found CUDA runtime at: /public/home/sb/anaconda3/envs/ft/lib/libcudart.so.12
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
Couldn't load the bitsandbytes library, likely due to missing binaries.
Please ensure bitsandbytes is properly installed.

For source installations, compile the binaries with `cmake -DCOMPUTE_BACKEND=cuda -S .`.
See the documentation for more details if needed.

Trying a simple check anyway, but this will likely fail...
Above we output some debug information.
Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose
WARNING: Please be sure to sanitize sensitive info from the output before posting it.

Reproduction

python -m bitsandbytes

Expected behavior

I can use cuda with torch and transformers but not bnb, I want to use bnb without error.

I faced similar issue.
You could try building and installing from the source. Then, add the corresponding /lib and /bin to LD_LIBRARY_PATH and PATH respectively.

Hi @tanvisharma, thank you for your reply, I have tried install from https://huggingface.co/docs/bitsandbytes/main/en/installation?backend=Apple+Silicon+%28MPS%29&source=Linux#installation with the code:

git clone https://github.com/TimDettmers/bitsandbytes.git && cd bitsandbytes/
pip install -r requirements-dev.txt

and

bash install_cuda.sh 121 ~/cuda-121
export BNB_CUDA_VERSION=121
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/YOUR_USERNAME/local/cuda-11.7
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/cuda-121/cuda-12.1/lib64
export PATH=~/cuda-121/cuda-12.1/bin

then pip install -e ., but it seems report the same error.
Could you provide a more detailed solution?