project-baize/baize-chatbot

run app.py error

XvHaidong opened this issue · 8 comments

Hello, when I run demo/app.py with 7B model, I got this problem 'addmm_impl_cpu_" not implemented for 'Half'. Could you please tell me how to fix it?
This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last):
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/blocks.py", line 1069, in process_api
result = await self.call_function(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/blocks.py", line 892, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/utils.py", line 549, in async_iteration
return next(iterator)
File "app.py", line 43, in predict
for x in greedy_search(input_ids,model,tokenizer,stop_words=["[|Human|]", "[|AI|]"],max_length=max_length_tokens,temperature=temperature,top_p=top_p):
File "/media/hlt/disk/chenyang_space/chenyang_space/xhd_space/baize-main/demo/app_modules/utils.py", line 253, in greedy_search
outputs = model(input_ids)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/peft/peft_model.py", line 575, in forward
return self.base_model(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
layer_outputs = decoder_layer(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/peft/tuners/lora.py", line 406, in forward
result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: "addmm_impl_cpu
" not implemented for 'Half'

hecor commented

Got this error too on macbook m1, please help, thanks~

Fix done, please check again.

hecor commented

Great,thanks

hecor commented

But it was very slow to generate reply on macbook m1, nearly 1 word every 1 minute, does any parameters can change this ?

You need to use GPU. It's so slow if you use CPU

hecor commented

got it, thanks~

zay95 commented

Hi, I run demo/app.py on the remote server with 7B mode, with output in the terminal:
Reloading javascript...
Running on local URL: http://127.0.0.1:7860

but it can't work on local chrom using the url.

Set share=True in app.py and use public URL.