addmm_impl_cpu_" not implemented for 'Half
hnuzhoulin opened this issue · 2 comments
hnuzhoulin commented
It alwayse using cpu and just one core.
card info:
Wed Jun 7 18:22:55 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... On | 00000000:00:0D.0 Off | 0 |
| N/A 33C P0 34W / 250W | 3300MiB / 32768MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 194690 C python3 3296MiB |
+-----------------------------------------------------------------------------+
running logs:
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
CUDA SETUP: Loading binary /home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
hnuzhoulin commented
recompile bitsandbytes and copy the new libbitsandbytes_cuda121.so to libbitsandbytes_cpu.so.
following warning is disappeared:
UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. "
But when I ask question,these error:
Traceback (most recent call last):
File "/home/zhoulin/LaWGPT/utils/callbacks.py", line 47, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/home/zhoulin/LaWGPT/webui.py", line 140, in generate_with_callback
model.generate(**kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/peft/peft_model.py", line 627, in generate
outputs = self.base_model.generate(**kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/transformers/generation/utils.py", line 1522, in generate
return self.greedy_search(
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/transformers/generation/utils.py", line 2339, in greedy_search
outputs = self(
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward
outputs = self.model(
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 578, in forward
layer_outputs = decoder_layer(
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/peft/tuners/lora.py", line 406, in forward
result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
Traceback (most recent call last):
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/gradio/routes.py", line 427, in run_predict
output = await app.get_blocks().process_api(
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/gradio/blocks.py", line 1323, in process_api
result = await self.call_function(
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/gradio/blocks.py", line 1067, in call_function
prediction = await utils.async_iteration(iterator)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/gradio/utils.py", line 336, in async_iteration
return await iterator.__anext__()
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/gradio/interface.py", line 633, in fn
async for output in iterator:
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/gradio/utils.py", line 329, in __anext__
return await anyio.to_thread.run_sync(
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/anyio/to_thread.py", line 28, in run_sync
return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
return await future
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 754, in run
result = context.run(func, *args)
File "/home/zhoulin/miniconda3/envs/LawGPT/lib/python3.10/site-packages/gradio/utils.py", line 312, in run_sync_iterator_async
return next(iterator)
File "/home/zhoulin/LaWGPT/webui.py", line 156, in evaluate
print(decoded_output)
UnboundLocalError: local variable 'decoded_output' referenced before assignment
Daniel-Yang-A commented
I encountered same problem