`Module.to(Device)` will cause warnings because non-leaf tensors `.grad` is accessed (Also for optimizers)

Question

`Module.to(Device)` will cause warnings because non-leaf tensors `.grad` is accessed (Also for optimizers)

yueyinqiu opened this issue 7 months ago · 6 comments

Answer 1 · 2024-06-04T10:01:33.000Z

That's a bit tricky.

We can't just check param.requires_grad because it might be a non-leaf tensor but retains_grad. I'm also not sure whether there are any other situations.

Answer 2 · 2024-06-04T10:12:57.000Z

Yes there are... We can simply do that in PyTorch:

import torch.nn

x = torch.zeros([], requires_grad=False)
x.grad = torch.zeros([]) + 10
print(x.grad)  # tensor(10.)

So we can never know whether their is a grad before we true access it...

Perhaps the only solution is to add a new C++ method for this. Or find a way to temporarily suppress warnings.

Answer 3 · 2024-06-13T19:45:39.000Z

Would it be enough to check tensor.is_leaf before accessing grad in the toEpilog() logic?

Answer 4 · 2024-06-13T19:49:48.000Z

I'm confused -- all the instances of .grad in Module are accessing parameters. How can they be non-leaf tensors?

Answer 5 · 2024-06-14T00:27:56.000Z

Hmmm... This issue was created due to #1322 . In that issue, this.conv.weight[..] = nn.Parameter(x.view(1, c1, 1, 1)); was used, which made it a non-leaf tensor.

Well I have just reconsidered about this... I suppose it's a misuse of slicing, and a warning is just what we want in this situation.

Answer 6 · 2024-06-14T15:37:04.000Z

So, should #1322 also be closed, then?