deep-diver/LLM-As-Chatbot

Error with bitsandbytes and running at all.

Websteria opened this issue · 1 comments

I've followed the instructions for installation on windows using Miniconda3.

Everything installs correctly but when I try to run it the following error occurs:
(alpaca-serve) C:\Alpaca-LoRA-Serve>python app.py --base_url C:\text-generation-webui-new\text-generation-webui\models\llama-7b-hf --ft_ckpt_url chainyo/alpaca-lora-7b

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
C:\Users\jeff_\miniconda3\envs\alpaca-serve\lib\site-packages\bitsandbytes\cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Overriding torch_dtype=None with torch_dtype=torch.float16 due to requirements of bitsandbytes to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Traceback (most recent call last):
File "C:\Alpaca-LoRA-Serve\app.py", line 234, in
run(args)
File "C:\Alpaca-LoRA-Serve\app.py", line 112, in run
model, tokenizer = load_model(
File "C:\Alpaca-LoRA-Serve\model.py", line 11, in load_model
model = LlamaForCausalLM.from_pretrained(
File "C:\Users\jeff_\miniconda3\envs\alpaca-serve\lib\site-packages\transformers\modeling_utils.py", line 2619, in from_pretrained
raise ValueError(
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.

I'm unsure how to proceed. Any advice is appreciated.

Not sure since I dont have Windows computer. Lets see if anyone else hops in and gives some advice