Fail to start the training at high resolutions.
Closed this issue · 4 comments
Hi there, thank you for this great reproduction. I also reproduced the training code based on the codebase of Stability AI. When I start my training with high resolutions (e.g. 576x1024, 512x896), I found the model can only produce a sequence of blur like below:
However, if I run it at a relatively lower resolution (e.g. 320x576, 448x768) with all other settings fixed, the sampling results is as perfect as the public results. In fact, I found that the terrible results occur when the video width exceeds 800 pixels, despite the official recommendation to run at 576x1024. It's possible that I mistakenly touched some codes in my codebase, and I'm still working on this issue. I'm curious, have you ever attempted training at high resolutions? Have you encountered any similar problems?
I've tried training at resolutions like 576x1024 and didn't encounter such a problem; it looks very strange, could you share your specific setting, and also, are the image_latents
being fed into the unet normally?
Thanks for your nice reply! I have checked the condition inputs but found no problems. The problem should be within the UNet blocks, but it does not contain any resolution-dependent operations (100 pixels in latent space is the activation threshold of my problem). That's weird. Anyway, I am going to close this issue and will update if I can find what's wrong. Thanks for your kind help again!
@pixeli99 Hi, for 576x1024 resolution, I cannot train on 80G A100; the error is OUT OF CUDA memory; I follow the provided code with frames=25; Do you know what happens, and how much gpu memory costs for your case.
@pixeli99 Hi, for 576x1024 resolution, I cannot train on 80G A100; the error is OUT OF CUDA memory; I follow the provided code with frames=25; Do you know what happens, and how much gpu memory costs for your case.
have you try deepspeed?