Possible bug in gradient size of mul_backward_kernel()?

Hi guys,

First of all, congratulations on this great work and thank you for making the code open-source.

While going through the implementation details, I noticed that the Grad type in the mul_backward_kernel() method in

Line 124 in 0fa9ce8

Grad dZ(grad + i*Group::N);

is accessed from the memory with the dimension N which appears to be the embedding size (or the representation size). Shouldn't this instead be accessed with dimension K

lietorch/lietorch/src/lietorch_cpu.cpp

Line 119 in 0fa9ce8

using Grad = Eigen::Matrix<scalar_t,1,Group::K>;

which is the dimension of the tangent space that satisfies the chain rule described in Eq. (14) of the paper.

Please let me know if this observation is correct or I have misunderstood this as a bug? Thanks in advance. Keep up the great work :)