Discussion on Computing Resources and Training Details

Question

Discussion on Computing Resources and Training Details

Closed this issue 8 months ago · 1 comments

Hello, I am very interested in your work, and I am really impressed with your demo. I would like to inquire about the number of GPUs used for training the diffusion model and the duration of the training time. Additionally, the dataset is sampled from WebVid-10M, and I noticed that you only sampled 16 frames each video. How do you ensure that the sampled series are sufficiently dynamic, and is this 16-frame sampling a tradeoff? Looking forward to your response!

Answer 1 · 2024-03-16T03:41:15.000Z

Hi, the model is trained on 16 A100 80G for two weeks. The 16 frames are sampled based on a frame interval from 1 to 5 to ensure these frames span a larger window of ~2 seconds in the original video. Please refer to the supplementary of the paper for more details.