princeton-vl/lietorch

Error during backward if groups are created with wrong data shapes

Xanthorapedia opened this issue · 1 comments

Hi, thanks for creating this wonderful library!❤️ However, I ran into the following bug when using it with torch.autograd.

For example, if I try to find the Jacobian w.r.t. a group parameter:

x = LieGroupParameter(SE3.Identity(1))
p = torch.ones(1000, 3)

def func1(x_, p_):
    # have to wrap with lietorch.SE3 because arguments are passed as tensors
    return SE3(x_).act(p_).sum()

# throws error: double free or corruption (!prev)
print(torch.autograd.functional.jacobian(func1, (x, p)))

The problem seems to be that during the preprocessing stage of torch.autograd, it clones the input data, but for LieGroupParameter, this only retrieves the tangent space data. Somehow the forward function was happy with those 1-dim-less inputs and didn't throw an error until the backward call. The same error appears if I do:

SE3(torch.zeros(1, 6, requires_grad=True)).act(torch.ones(1000, 3)).sum().backward() #double free or corruption (!prev)

It would be nice if you can help look into the above error. Also, it's probably a good idea for LieGroup.__init__ or the op implementations to have some sanity check on the input dimensions so that it catches the problem earlier. Otherwise, thank you for your hard work!😃

PS: For my hypothetical use case above, I was able to workaround by:

def func2(x_, p_):
    return SE3.Identity(x_.shape[:-1]).retr(x_).act(p_).sum()

print(torch.autograd.functional.jacobian(func2, (x, p)))

Output:

(tensor([[1000., 1000., 1000.,    0.,    0.,    0.]]), tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        ...,
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]))