Using t2v Cogvideox1.5-5B model, the videos generated with the prompts on the homepage are completely different, and the top and bottom of the videos are the same (repeating parts of the video)

Question

Using t2v Cogvideox1.5-5B model, the videos generated with the prompts on the homepage are completely different, and the top and bottom of the videos are the same (repeating parts of the video)

lgao-matax opened this issue 3 months ago · 3 comments

lgao-matax commented 3 months ago

System Info / 系統信息

cuda: 12.4
diffusers:0.32.0.dev0
硬件：A100-40G 8卡机器

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

cli_demo.py；将 pipe 改成 pipe = CogVideoXPipeline.from_pretrained(MODEL, torch_dtype=torch.bfloat16,device_map="balanced",max_memory={1: "40GB", 3: "40GB", 4: "40GB"})；同时关闭 cpu offload

Expected behavior / 期待表现

效果能达到官方的视频实列；并且视频不要出现上下重复视频的效果

Answer 1 · 2024-12-02T05:58:57.000Z

What does it mean? I hope to get a specific explanation. Is it different from the video effect mentioned?

Answer 2 · 2024-12-03T01:15:27.000Z

这样的：顶部和底部虽然不完全一样但是非常相似看上去就是底部重复了底部的部分。

Answer 3 · 2024-12-03T01:20:09.000Z

the panda ：prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."