‘video_contains_first_frame’ in encoder
Du-Yao opened this issue · 1 comments
Great works! But I find the code had some mistake. In Line 1563 in magvit2_pytorch.py, I notice authors use left pad, so the first frame should be video[:, :, self.time_padding], and video should be video[:, :, (self.time_padding + 1):]. Please check the code, if I have misunderstood, please also point it out.
I have another question. When using this set of code to train on an image dataset, why are the reconstructed images the same when inputting different images, whether it is a model with randomly initialized parameters or a trained model. Additionally, a decrease in loss is normal, meaning that the reconstructed images are all the same, unless they are completely black or have other colors. How to solve this problem. Thanks!
@Du-Yao ah yes, there was an issue with encoding the first frame separately
could you let me know if the latest version looks better?