8-bit precision not working on Windows
Closed this issue · 26 comments
It seems like it doesn't like to work on Windows and is unable to detect my cuda installation.
(textgen) C:\Users\pasil\text-generation-webui>python server.py --cai-chat --load-in-8bit
Warning: chat mode currently becomes a lot slower with text streaming on.
Consider starting the web UI with the --no-stream option.
Loading pygmalion-6b_dev...
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py:134: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('C')}
warn(msg)
C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py:134: UserWarning: C:\Users\pasil\anaconda3\envs\textgen did not contain libcudart.so as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py:134: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py:134: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py:134: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
File "C:\Users\pasil\text-generation-webui\server.py", line 235, in <module>
model, tokenizer = load_model(model_name)
File "C:\Users\pasil\text-generation-webui\server.py", line 109, in load_model
model = eval(command)
File "<string>", line 1, in <module>
File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\transformers\models\auto\auto_factory.py", line 463, in from_pretrained
return model_class.from_pretrained(
File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py", line 2279, in from_pretrained
from .utils.bitsandbytes import get_keys_to_not_convert, replace_8bit_linear
File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\transformers\utils\bitsandbytes.py", line 10, in <module>
import bitsandbytes as bnb
File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\__init__.py", line 7, in <module>
from .autograd._functions import (
File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\autograd\_functions.py", line 8, in <module>
import bitsandbytes.functional as F
File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\functional.py", line 17, in <module>
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\cextension.py", line 22, in <module>
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues
bitsandbytes currently does not support windows, but there are some workarounds.
This is one of them: bitsandbytes-foundation/bitsandbytes#30
Thanks, I managed to somehow make it work.
Thanks, I managed to somehow make it work.
How did you manage to make it work? can you share the method?
Basically you have to download these 2 dll files from here. then you move those files into anaconda3\env\textgen\Lib\site-packages\bitsandbytes
(assuming you're using conda) after that you have to edit one file in anaconda3\env\textgen\Lib\site-packages\bitsandbytes\cuda_setup
edit the main.py
with these:
Change ct.cdll.LoadLibrary(binary_path)
to ct.cdll.LoadLibrary(str(binary_path))
two times in the file.
Then replace
if not torch.cuda.is_available(): return 'libsbitsandbytes_cpu.so', None, None, None, None
with
if torch.cuda.is_available(): return 'libbitsandbytes_cuda116.dll', None, None, None, None
After that it should let you load the models using 8 bit precision.
EDIT: I celebrated too early, it gives me a cublast error on trying to generate lol
@minipasila THANKYOU SO MUCH! Your instructions + prebuilt bitandbytes for older GPUs https://github.com/james-things/bitsandbytes-prebuilt-all_arch is helping me run Pygmalion 2.7B on my GTX 1060 6GB and it's taking only 3.8 GB VRAM ( out of which prolly 0.4 is being used by the system as I don't have inbuilt graphics )
@minipasila Thank you for sharing, your instructions worked perfectly for GPT-J-6B on 3070ti
For future reference, the 8 bit windows fix required me to navigate to my Python310 install folder instead of the env, as bitsandbytes was not installed in the conda env.
For anybody having troubles still, you can try using newer library - https://github.com/james-things/bitsandbytes-prebuilt-all_arch
Using v37 did it for me finally :)
For anybody having troubles still, you can try using newer library - https://github.com/james-things/bitsandbytes-prebuilt-all_arch Using v37 did it for me finally :)
For future reference, the 8 bit windows fix required me to navigate to my Python310 install folder instead of the env, as bitsandbytes was not installed in the conda env.
I still have the same issue I tried everything linked except the v37 fix, I downloaded the dll and put it in the bitsandbytes folder, what next?
I still have the same issue I tried everything linked except the v37 fix, I downloaded the dll and put it in the bitsandbytes folder, what next?
Just change libbitsandbytes_cuda116.dll
to libbitsandbytes_cudaall.dll
in anaconda3\env\textgen\Lib\site-packages\bitsandbytes\cuda_setup\main.py
I followed the suggestions above but when i try to run it with 8-bit precision i get an error window pop up "Bad Image
python.exe libbitsandbytes_cuda116.dll is either not designed to run on Windows or it contains an error. Try installing the program again using the original installation media or contact your system administrator or the software vendor for support. Error status 0xc000012f."
EDIT:
Okay, this is weird. I copied the dll from my stable diffusion bitsandbytes folder and it seems to work now.
I got the same error before.
Now I copied the cudaall.dll from stable diffusion bitsandbytes folder instead of cuda116.dll.
It started, even when nothing else has worked!
When attempting to generate with 8-bit using the new libraries suggested by VertexMachine, I get this error.
C:\oobabooga\installer_files\env\lib\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py:195: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorCompare.cpp:413.) attn_weights = torch.where(causal_mask, attn_weights, mask_value)
Loading llama-7b-hf...
Traceback (most recent call last):
File "E:\anaen\textgen\text-generation-webui\server.py", line 191, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "E:\anaen\textgen\text-generation-webui\modules\models.py", line 130, in load_model
model = eval(command)
File "", line 1, in
File "E:\anaen\textgen\lib\site-packages\transformers\models\auto\auto_factory.py", line 434, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "E:\anaen\textgen\lib\site-packages\transformers\models\auto\configuration_auto.py", line 873, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "E:\anaen\textgen\lib\site-packages\transformers\models\auto\configuration_auto.py", line 579, in getitem
raise KeyError(key)
KeyError: 'llama'
how to fix it?
I followed the suggestions above but when i try to run it with 8-bit precision i get an error window pop up "Bad Image python.exe libbitsandbytes_cuda116.dll is either not designed to run on Windows or it contains an error. Try installing the program again using the original installation media or contact your system administrator or the software vendor for support. Error status 0xc000012f."
EDIT:
Okay, this is weird. I copied the dll from my stable diffusion bitsandbytes folder and it seems to work now.
How do you find the stable diffusion bitsandbytes folder?
I followed the suggestions above but when i try to run it with 8-bit precision i get an error window pop up "Bad Image python.exe libbitsandbytes_cuda116.dll is either not designed to run on Windows or it contains an error. Try installing the program again using the original installation media or contact your system administrator or the software vendor for support. Error status 0xc000012f."
EDIT:
Okay, this is weird. I copied the dll from my stable diffusion bitsandbytes folder and it seems to work now.How do you find the stable diffusion bitsandbytes folder?
It's in this folder: stable-diffusion-webui\venv\Lib\site-packages\bitsandbytes
8-bit should work out of the box with the new one-click installer
https://github.com/oobabooga/text-generation-webui#one-click-installers
Please review response post by @PhyX-Meow re: yuk7/ArchWSL#248. As he points out, it really has nothing to do with your Linux install. It's a simple fix in Windows. The solution he posts is for Arch, but the fix is exactly same for Ubuntu, etc. in a WSL2 install. The issue is that Windows delivers libcuda.so, libcuda.so.1, and libcuda.so.1.1 as fully separate copies of the same file. The fix is just to remove libcuda.so and libcuda.so.1, and just make sym links for each of them to libcuda.so.1.1
Run a command line shell as Administrator, type "cmd" to get a non-powershell command line.
Then type the following commands to create the problematic symbolic links:
C: cd \Windows\System32\lxss\lib del libcuda.so del libcuda.so.1 mklink libcuda.so libcuda.so.1.1 mklink libcuda.so.1 libcuda.so.1.1
when you're done, it will look like this:
**C:\Windows\System32\lxss\lib> DIR ... ... Directory of C:\Windows\System32\lxss\lib 03/15/2022 04:00 PM **
** . 03/15/2022 03:59 PM libcuda.so [libcuda.so.1.1] 03/15/2022 04:00 PM libcuda.so.1 [libcuda.so.1.1]**
Then, just finish your command you were running, in my case, the solution was just run "apt reinstall libc-bin". This is because libc-bin was getting the errors when I had run "apt upgrade -y" command. (see below)The error I received in my "apt upgrade -y" command was two lines: #> apt upgrade -y .... < stuff deleted > ... Processing triggers for libc-bin (2.31-0ubuntu9.7) ... /sbin/ldconfig.real: /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link ... < stuff deleted > ...
As per @PhyX-Meow
"Actually this is not relate to Arch, nor ArchWSL. It's caused by libcuda.so in your C:\Windows\System32\lxss\lib\ folder not a symbolic link, which is installed by nvidia driver. One solution to [remove] the warning is delete libcuda.so and libcuda.so.1 and use make symbolic link to libcuda.so.1.1. Command line: mklink . Note the command not work in powershell, you shall use cmd.exe."
:)
This seems to be solving the issue for me, still working on it.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB. GPU 0 has a total capacity of 16.00 GiB of which 0 bytes is free. Of the allocated memory 15.06 GiB is allocated by PyTorch, and 54.28 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB. GPU 0 has a total capacity of 16.00 GiB of which 0 bytes is free. Of the allocated memory 15.06 GiB is allocated by PyTorch, and 54.28 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
It means you ran out of memory (try 4bit precision or another quant method like GPTQ/EXL2/GGUF or smaller model) but this error is unrelated to this issue. (sorry for another notification)
Is it not compatible? Even slower is faster than llama...
RuntimeError: CUDA error: no kernel image is available for execution on the device
Compile with to enable device-side assertions.TORCH_USE_CUDA_DSA
RuntimeError: CUDA error: no kernel image is available for execution on the device Compile with to enable device-side assertions.TORCH_USE_CUDA_DSA
Ok this is a different error.. so uhh you probably should give a bit more information about your issue like how you installed textgen and what your setup is etc.. potentially just make a new issue because this is probably a different issue than what I made this for originally.
exllamav2 Directly loaded, it is also cuda12.4 but it does not support the m40GPU Max architecture of computing Power 5.2
在本地 URL 上运行:http://127.0.0.1:7860/
01:55:11-441029 INFO 加载“14b-exl”
01:55:12-675028 错误 无法加载模型。
回溯(最近一次调用):
文件“D:\text-generation-webui\modules\ui_model_menu.py”,第 249 行,在
shared.model load_model_wrapper,shared.tokenizer = load_model(selected_model,加载器)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\modules\models.py”,第 94 行,在 load_model
输出 = load_func_map加载器
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\modules\models.py”,第 366 行,在 modules.exllamav2 导入的ExLlamav2_loader
中 Exllamav2Model
文件“D:\text-generation-webui\modules\exllamav2.py”,第 5 行,从
exllamav2 导入 (
文件“D:\text-generation-webui\installer_files env\Lib\site-packages\exllamav2_init_.py”,第 3 行,从
exllamav2.model 导入 ExLlamaV2
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py”,第 25 行,从
exllamav2.linear 导入 ExLlamaV2Linear
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py”,第 7 行,从
exllamav2.module 导入 ExLlamaV2Module
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py”,第 14 行,在
os.environ[“CUDA_LAUNCH_BLOCKING”] = “1”
^^
NameError:未定义名称“os”
01:55:54-858096 INFO 加载“14b-exl”
01:55:56-017617 错误 无法加载模型。
回溯(最近一次调用):
文件“D:\text-generation-webui\modules\ui_model_menu.py”,第 249 行,在
shared.model load_model_wrapper,shared.tokenizer = load_model(selected_model,加载器)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\modules\models.py”,第 94 行,在 load_model
输出 = load_func_map加载器
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\modules\models.py”,第 368 行,在 ExLlamav2_loader
模型中,分词器 = Exllamav2Model.from_pretrained(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\modules\exllamav2.py”,第 60 行,在
model.load(split) from_pretrained
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py”,第 333 行,加载
中对于 f 中的项目:x = item
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py”,第 356 行,在
module.load()
load_gen 文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\attn.py”,第 255 行,在 load
self.k_proj.load()
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py”,第 92 行,如果 w 为 None,则在加载
中: w = self.load_weight()
^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py”,第 110 行,在 load_weight
qtensors = self.load_multi(key, [“q_weight”, “q_invperm”, “q_scale”, “q_scale_max”, “q_groups”, “q_perm”, “bias”])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py”,第 90 行,load_multi
张量[k] = stfile.get_tensor(key + “.” + k, device = self.device())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 文件“D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\fasttensors.py”,第 204 行,在 get_tensor
张量 = f.get_tensor(key)
^^^^^^^^^^^^^^^^^
RuntimeError:CUDA 错误:没有内核映像可用于在设备
编译方式上执行以启用设备端断言。TORCH_USE_CUDA_DSA