jy0205/Pyramid-Flow

Questions about implementation

Closed this issue · 7 comments

Hi! Thanks for sharing training code!

While I analyzing implementation in details and have few questions.

  1. clean_latent = latents_list[i_s][index::column_size] # [bs, c, t, h, w]
  • Why use indexing [index::column_size] ? Since latent_list[i_s] would have shape of [bs, c, t, h, w] so
    latents_list[i_s][index::column_size] means just getting one batch, isn't it?
  1. How video sync group works?
  • If I use 8 gpus and default parallel group hyper-parameter setting, sp_group_size and video_sync_group would be 8.
  • Since sequential parallel already split long tokens so every gpu gets access to the same video input, why video_sync_group is necessary??
  1. When extract video latent in advance, all videos have same fps ?? Since this line means if "frame" is not specified in annotation, extract first 121 frames
  1. Why multiplying 2 in here? to preserve variance for each stage ?

Thanks!

Here are the answers to your questions:

  1. latents_list[i_s][index::column_size] aims to get a batch of samples that belong to the same stage.
  2. We do not use the sequence parallel in the training. The code about sequence parallel is for the multi-gpu inference. The param video_sync_group is for controlling the group of processes that accept the same input sample.
  3. We directly use 24 fps for training. The frames key means you can specify the frame indexes you want to extract.
  4. Multiplying is to make the variance of noise still equal to 1 after bilinear interpolation. (Statisfy standard Gaussian)

Thanks!

Another questions

  • Supports I2V training ? Current implementations seems only support t2v.

Theoretically, it naturally performs I2V training during autoregressive training (since the first frame is an image). However, we have not explicitly optimized for I2V, so the performance may be suboptimal. We are working on some improvements and will share them in due time.

Yes, sounds right. autoregressive training naturally doing I2V training.

  • Looking forward to share! Thanks.

Another question ?

Great observation! Please refer to #28 (comment).

Thanks for quick answer!