richzhang/PerceptualSimilarity

RuntimeError: Function 'SqrtBackward0' returned nan values in its 0th output

Can-Zhao opened this issue · 4 comments

Hi,

I got RuntimeError: Function 'SqrtBackward0' returned nan values in its 0th output
in feats0[kk], feats1[kk] = lpips.normalize_tensor(outs0[kk]), lpips.normalize_tensor(outs1[kk])

It seems that this issue might be solved by changing

norm_factor = torch.sqrt(torch.sum(in_feat**2,dim=1,keepdim=True))

to norm_factor = torch.sqrt(torch.sum(in_feat**2,dim=1,keepdim=True) + eps)

Thanks for posting this. I've been struggling to figure out what's the 'SqrtBackward0' issue and how to fix it. Perhaps, zero output causes numerical instability in the back prop in torch.sqrt!

Perhaps better option is to fix the torch.sqrt function, as in my case I'm directly using a torch.sqrt in my model.

Facing the same issue. @Can-Zhao Did adding EPS solve the issue ?

Is there a proposed PR for this ? Should I make one ?

Hello,

I faced same problem and fixed by changing to
norm_factor = torch.sqrt(torch.sum(x ** 2, dim=1, keepdim=True) + 1e-8)

It's a big problem, has impact to other libs @richzhang
Lightning-AI/pytorch-lightning#18712