lucidrains/magvit2-pytorch

‘video_contains_first_frame’ in encoder

Du-Yao opened this issue · 1 comments

Du-Yao commented

Great works! But I find the code had some mistake. In Line 1563 in magvit2_pytorch.py, I notice authors use left pad, so the first frame should be video[:, :, self.time_padding], and video should be video[:, :, (self.time_padding + 1):]. Please check the code, if I have misunderstood, please also point it out.
I have another question. When using this set of code to train on an image dataset, why are the reconstructed images the same when inputting different images, whether it is a model with randomly initialized parameters or a trained model. Additionally, a decrease in loss is normal, meaning that the reconstructed images are all the same, unless they are completely black or have other colors. How to solve this problem. Thanks!

@Du-Yao ah yes, there was an issue with encoding the first frame separately

could you let me know if the latest version looks better?