Dealing with aliasing
Opened this issue · 0 comments
Myia is currently not able to handle aliased tensors in data structures. This issue can crop up in the Pytorch frontend, in code like this:
class LinearSeq(torch.nn.Module):
def __init__(self, a, b):
super(LinearSeq, self).__init__()
self.lin = torch.nn.Linear(a, b)
self.seq = torch.nn.Sequential(self.lin)
def forward(self, x):
return self.seq(x)
The problem is that Myia sees both self.lin
and self.seq[0]
, but it understands them as different parameters rather than the same parameter. Thus, if forward
only uses self.seq
, the gradient wrt self.lin
is zero, and the update will be applied on seq
, but not lin
. Furthermore, if both seq
and lin
are used, they will accumulate gradients separately and will diverge.
This is a difficult problem, and if we handle it, I believe it would be best to consider the aliasing patterns statically (by which I mean specialize graphs wrt aliasing patterns). The fact that two tensors in opposite corners of a data structure may be aliased seems particularly difficult to deal with, but maybe we can get away with only supporting a few simple patterns.
So the question is, how do we deal with this?