Incorrect sequence parallel for CogVideoX?
monellz opened this issue · 3 comments
CogVideoX cat encoder_hidden_states and hidden_states at seq dim in attn processor. But currently the sequence parallel implementation in videosys seems only split hidden_states at seq dim, and still cat entire encoder_hidden_states and splited hidden_states. The computation semantics of attention after all-to-all appear to differ from those before.
I don't understand why videosys only splits hidden_states at seq dim.
The three figures are the first frames generated by 1 gpu (no parallel), 2 gpus (cp_size=2) and 4 gpus (cp_size=2, sp_size=2). While they appear natural, there are notable differences (such as the fallen leaves to the left of the dog) that should not occur.
yeah thats a problem. we will fix soon. thanks for your feedback!
我也遇到了类似的问题,不知道该如何解决