princeton-vl/lietorch

Possible bug in gradient size of mul_backward_kernel()?

Opened this issue · 0 comments

Hi guys,

First of all, congratulations on this great work and thank you for making the code open-source.

While going through the implementation details, I noticed that the Grad type in the mul_backward_kernel() method in

Grad dZ(grad + i*Group::N);
is accessed from the memory with the dimension N which appears to be the embedding size (or the representation size). Shouldn't this instead be accessed with dimension K
using Grad = Eigen::Matrix<scalar_t,1,Group::K>;

which is the dimension of the tangent space that satisfies the chain rule described in Eq. (14) of the paper.

Please let me know if this observation is correct or I have misunderstood this as a bug? Thanks in advance. Keep up the great work :)