microsoft/VQ-Diffusion

Some parameters don't receive gradients.

guyuchao opened this issue · 2 comments

Hello, when I am running training command on coco, I encounter the following error:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by
making sure all forward function outputs participate in calculating loss.

Then I find the parameters in "module.content_codec" and "module.transformer.condition_emb" don't receive grads. So should we set find_unused_parameters=True in DDP?

tzco commented

I think it is the parameter "empty_text_embed" in diffusion_transformer.py that didn't receive grads. "empty_text_embed" is for learnable classifier-free and sorry we forgot to check. You can add the if to line 148 of diffusion_transformer.py and try again:

if learnable_cf:
    self.empty_text_embed = torch.nn.Parameter(torch.randn(size=(77, 512), requires_grad=True, dtype=torch.float64))

Thanks for your reply.