‘video_contains_first_frame’ in encoder

Question

‘video_contains_first_frame’ in encoder

Du-Yao opened this issue a year ago · 1 comments

Great works! But I find the code had some mistake. In Line 1563 in magvit2_pytorch.py, I notice authors use left pad, so the first frame should be video[:, :, self.time_padding], and video should be video[:, :, (self.time_padding + 1):]. Please check the code, if I have misunderstood, please also point it out.
I have another question. When using this set of code to train on an image dataset, why are the reconstructed images the same when inputting different images, whether it is a model with randomly initialized parameters or a trained model. Additionally, a decrease in loss is normal, meaning that the reconstructed images are all the same, unless they are completely black or have other colors. How to solve this problem. Thanks!

Answer 1 · 2024-01-11T14:01:40.000Z

@Du-Yao ah yes, there was an issue with encoding the first frame separately

could you let me know if the latest version looks better?