Incorrect sequence parallel for CogVideoX?

Question

Incorrect sequence parallel for CogVideoX?

monellz opened this issue 3 months ago · 3 comments

CogVideoX cat encoder_hidden_states and hidden_states at seq dim in attn processor. But currently the sequence parallel implementation in videosys seems only split hidden_states at seq dim, and still cat entire encoder_hidden_states and splited hidden_states. The computation semantics of attention after all-to-all appear to differ from those before.

I don't understand why videosys only splits hidden_states at seq dim.

The three figures are the first frames generated by 1 gpu (no parallel), 2 gpus (cp_size=2) and 4 gpus (cp_size=2, sp_size=2). While they appear natural, there are notable differences (such as the fallen leaves to the left of the dog) that should not occur.

Answer 1 · 2024-09-18T07:39:09.000Z

yeah thats a problem. we will fix soon. thanks for your feedback!

Answer 2 · 2024-09-24T12:09:08.000Z

我也遇到了类似的问题，不知道该如何解决

Answer 3 · 2024-09-25T14:04:28.000Z

The bug should have been fixed in #218