tensorchord/modelz-llm

bug: RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' in chatglm int4

Closed this issue · 2 comments

2023-06-02 19:24:42,485 - 254807 - ERROR - app.py:1047 - [FALCON] Unhandled exception in ASGI app
Traceback (most recent call last):
File "/home/gaocegege/applications/miniconda3/envs/dev/lib/python3.9/site-packages/falcon/asgi/app.py", line 406, in call
await responder(req, resp, **params)
File "/home/gaocegege/code/go/src/github.com/tensorchord/modelz-llm/src/modelz_llm/falcon_service.py", line 283, in on_post
for comp in self.model.step_generate(chat_req):
File "/home/gaocegege/code/go/src/github.com/tensorchord/modelz-llm/src/modelz_llm/falcon_service.py", line 146, in step_generate
out = self.model(
File "/home/gaocegege/applications/miniconda3/envs/dev/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/gaocegege/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/02a065cf2797029c036a02cac30f1da1a9bc49a3/modeling_chatglm.py", line 1190, in forward
transformer_outputs = self.transformer(
File "/home/gaocegege/applications/miniconda3/envs/dev/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/gaocegege/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/02a065cf2797029c036a02cac30f1da1a9bc49a3/modeling_chatglm.py", line 996, in forward
layer_ret = layer(
File "/home/gaocegege/applications/miniconda3/envs/dev/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/gaocegege/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/02a065cf2797029c036a02cac30f1da1a9bc49a3/modeling_chatglm.py", line 624, in forward
attention_input = self.input_layernorm(hidden_states)
File "/home/gaocegege/applications/miniconda3/envs/dev/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/gaocegege/applications/miniconda3/envs/dev/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 190, in forward
return F.layer_norm(
File "/home/gaocegege/applications/miniconda3/envs/dev/lib/python3.9/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

We may need half for 4bit

model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()

Can you try it again? Should be fixed in #64