run app.py error

Question

run app.py error

XvHaidong opened this issue 2 years ago · 8 comments

Hello, when I run demo/app.py with 7B model, I got this problem 'addmm_impl_cpu_" not implemented for 'Half'. Could you please tell me how to fix it?
This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last):
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/blocks.py", line 1069, in process_api
result = await self.call_function(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/blocks.py", line 892, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/utils.py", line 549, in async_iteration
return next(iterator)
File "app.py", line 43, in predict
for x in greedy_search(input_ids,model,tokenizer,stop_words=["[|Human|]", "[|AI|]"],max_length=max_length_tokens,temperature=temperature,top_p=top_p):
File "/media/hlt/disk/chenyang_space/chenyang_space/xhd_space/baize-main/demo/app_modules/utils.py", line 253, in greedy_search
outputs = model(input_ids)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/peft/peft_model.py", line 575, in forward
return self.base_model(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
layer_outputs = decoder_layer(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/peft/tuners/lora.py", line 406, in forward
result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: "addmm_impl_cpu" not implemented for 'Half'

Answer 1 · 2023-04-05T03:15:37.000Z

Got this error too on macbook m1, please help, thanks~

Answer 2 · 2023-04-05T04:05:43.000Z

Fix done, please check again.

Answer 3 · 2023-04-07T15:21:12.000Z

Great，thanks

Answer 4 · 2023-04-07T15:22:52.000Z

But it was very slow to generate reply on macbook m1, nearly 1 word every 1 minute, does any parameters can change this ?

Answer 5 · 2023-04-10T01:37:52.000Z

You need to use GPU. It's so slow if you use CPU

Answer 6 · 2023-04-13T01:37:51.000Z

got it, thanks~

Answer 7 · 2023-04-18T02:39:20.000Z

Hi, I run demo/app.py on the remote server with 7B mode, with output in the terminal:
Reloading javascript...
Running on local URL: http://127.0.0.1:7860

but it can't work on local chrom using the url.

Answer 8 · 2023-04-18T07:03:33.000Z

Set share=True in app.py and use public URL.