ChatGLM3的输入长度超过8k依然报错

Question

ChatGLM3的输入长度超过8k依然报错

Closed this issue 7 months ago · 0 comments

lzhfe commented 10 months ago

提交前必须检查以下项目 | The following items must be checked before submission

请确保使用的是仓库最新代码（git pull），一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案 | I have searched the existing issues / discussions

问题类型 | Type of problem

模型推理和部署 | Model inference and deployment

操作系统 | Operating system

Linux

详细描述问题 | Detailed description of the problem

关联issue：#189
问题依旧，但报错地方变了

2023-12-13 05:04:37.178 | WARNING | generation.chatglm:generate_stream_chatglm_v3:187 - Input length larger than 8192
Input length of input_ids is 8192, but max_length is set to 8192. This can lead to unexpected behavior. You should consider increasing max_new_tokens.
Traceback (most recent call last):
File "/app/core/default.py", line 288, in _generate
for output in self.generate_stream_func(self.model, self.tokenizer, params):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/app/generation/chatglm.py", line 209, in generate_stream_chatglm_v3
for total_ids in model.stream_generate(**inputs, eos_token_id=eos_token_id, **gen_kwargs):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/.cache/huggingface/modules/transformers_modules/models/modeling_chatglm.py", line 1156, in stream_generate
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/models/modeling_chatglm.py", line 937, in forward
transformer_outputs = self.transformer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/models/modeling_chatglm.py", line 830, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/models/modeling_chatglm.py", line 640, in forward
layer_ret = layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/models/modeling_chatglm.py", line 542, in forward
layernorm_output = self.input_layernorm(hidden_states)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/models/modeling_chatglm.py", line 189, in forward
variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

INFO: 172.21.66.153:43936 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [1021,0,0], thread: [96,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.

Dependencies

No response

运行日志或截图 | Runtime logs or screenshots

No response