THUDM/CogVLM2

使用cogvlm2-video的CLI demo报错Exception has occurred: RuntimeError: view size is not compatible with input tensor's size and stride

Celine-hxy opened this issue · 3 comments

System Info / 系統信息

寒武纪 pytorch2.1 python=3.10

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

  1. python /home/user/CogVLM2_mlu/video_demo/cli_video_demo.py
  2. Input this video "https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4"
Exception has occurred: RuntimeError
view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
  File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-video-llama3-chat/visual.py", line 78, in forward
    output = self.dense(out.view(B, L, -1))
  File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-video-llama3-chat/visual.py", line 114, in forward
    attention_output = self.input_layernorm(self.attention(attention_input))
  File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-video-llama3-chat/visual.py", line 129, in forward
    hidden_states = layer_module(hidden_states)
  File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-video-llama3-chat/visual.py", line 165, in forward
    x = self.transformer(x)
  File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-video-llama3-chat/modeling_cogvlm.py", line 374, in encode_images
    images_features = self.vision(images[0])
  File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-video-llama3-chat/modeling_cogvlm.py", line 402, in forward
    images_features = self.encode_images(images)
  File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-video-llama3-chat/modeling_cogvlm.py", line 635, in forward
    outputs = self.model(
  File "/home/user/CogVLM2_mlu/video_demo/cli_video_demo.py", line 137, in <module>
    outputs = model.generate(**inputs, **gen_kwargs)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Expected behavior / 期待表现

.

已经测试,没能复现,确定是NV的卡吗

请问咋解决的?

请问咋解决的?

有些卡的view算子实现有问题,在特殊的卡上运行,需要手动把代码里面的view换成reshape