mila-iqia/myia

Dealing with aliasing

Opened this issue · 0 comments

Myia is currently not able to handle aliased tensors in data structures. This issue can crop up in the Pytorch frontend, in code like this:

class LinearSeq(torch.nn.Module):
    def __init__(self, a, b):
        super(LinearSeq, self).__init__()
        self.lin = torch.nn.Linear(a, b)
        self.seq = torch.nn.Sequential(self.lin)

    def forward(self, x):
        return self.seq(x)

The problem is that Myia sees both self.lin and self.seq[0], but it understands them as different parameters rather than the same parameter. Thus, if forward only uses self.seq, the gradient wrt self.lin is zero, and the update will be applied on seq, but not lin. Furthermore, if both seq and lin are used, they will accumulate gradients separately and will diverge.

This is a difficult problem, and if we handle it, I believe it would be best to consider the aliasing patterns statically (by which I mean specialize graphs wrt aliasing patterns). The fact that two tensors in opposite corners of a data structure may be aliased seems particularly difficult to deal with, but maybe we can get away with only supporting a few simple patterns.

So the question is, how do we deal with this?