alibaba/Curvature-Learning-Framework

adam优化器 计算一阶二阶矩累计和为啥直接用欧式的加法

talorwu opened this issue · 8 comments

image
这里的为啥直接加法了,我理解应该是用指数映射做加法

以及乘法是不是也应该用曲率空间的scalar multiplication

这里是实现了Riemannian Adaptive Optimization Methods中Figure.1 提到的算法,其保证了收敛性。当然也可以尝试exp map,或许表现更优,欢迎开发~

这里是实现了Riemannian Adaptive Optimization Methods中Figure.1 提到的算法,其保证了收敛性。当然也可以尝试exp map,或许表现更优,欢迎开发~

似乎实现的不太对,你代码中m和\tao没做区分
image

还有一个问题,根据论文假设,二阶矩每个分量要单独计算,不能一起算
image

似乎实现的不太对,你代码中m和\tao没做区分

Actually, the Identity function is the simplest isometry. In practice, we notice that the identity function could provide acceptable accuracy, which also brings good training efficiency. We agree that complicated isometry may lead to higher performance on specific tasks. Hope that further investigation can be conducted upon this base implementation :) Thanks for pointing out this issue.

还有一个问题,根据论文假设,二阶矩每个分量要单独计算,不能一起算

Please refer to the last paragraph on page 5 of the paper, which claims that under certain conditions (e.g, the simplest condition) the secondary moment can be computed together.

Problem solved.

Thanks for pointing out the possible issues.
For efficiency, we implemented \varphi as the identity function and optimize each submanifold equally.
We will fix it in later version to keep track of the paper.