Questions about implementation
Closed this issue · 7 comments
yjhong89 commented
Hi! Thanks for sharing training code!
While I analyzing implementation in details and have few questions.
- Why use indexing
[index::column_size]
? Sincelatent_list[i_s]
would have shape of[bs, c, t, h, w]
so
latents_list[i_s][index::column_size]
means just getting one batch, isn't it?
- How video sync group works?
- If I use 8 gpus and default parallel group hyper-parameter setting, sp_group_size and video_sync_group would be 8.
- Since sequential parallel already split long tokens so every gpu gets access to the same video input, why video_sync_group is necessary??
- When extract video latent in advance, all videos have same fps ?? Since this line means if "frame" is not specified in annotation, extract first 121 frames
- Why multiplying 2 in here? to preserve variance for each stage ?
Thanks!
jy0205 commented
Here are the answers to your questions:
latents_list[i_s][index::column_size]
aims to get a batch of samples that belong to the same stage.- We do not use the sequence parallel in the training. The code about sequence parallel is for the multi-gpu inference. The param
video_sync_group
is for controlling the group of processes that accept the same input sample. - We directly use 24 fps for training. The
frames
key means you can specify the frame indexes you want to extract. - Multiplying is to make the variance of noise still equal to 1 after bilinear interpolation. (Statisfy standard Gaussian)
yjhong89 commented
Thanks!
yjhong89 commented
Another questions
- Supports I2V training ? Current implementations seems only support t2v.
feifeiobama commented
Theoretically, it naturally performs I2V training during autoregressive training (since the first frame is an image). However, we have not explicitly optimized for I2V, so the performance may be suboptimal. We are working on some improvements and will share them in due time.
yjhong89 commented
Yes, sounds right. autoregressive training naturally doing I2V training.
- Looking forward to share! Thanks.
Another question ?
- Then, why scale factor for first frame and remaining frames is different ?? (using same vae)
feifeiobama commented
Great observation! Please refer to #28 (comment).
yjhong89 commented
Thanks for quick answer!