adam优化器计算一阶二阶矩累计和为啥直接用欧式的加法

Question

adam优化器计算一阶二阶矩累计和为啥直接用欧式的加法

talorwu opened this issue 3 years ago · 8 comments

talorwu commented 3 years ago

这里的为啥直接加法了，我理解应该是用指数映射做加法

Answer 1 · 2021-10-25T10:33:26.000Z

以及乘法是不是也应该用曲率空间的scalar multiplication

Answer 2 · 2021-10-25T11:35:00.000Z

这里是实现了Riemannian Adaptive Optimization Methods中Figure.1 提到的算法，其保证了收敛性。当然也可以尝试exp map，或许表现更优，欢迎开发~

Answer 3 · 2021-10-26T07:42:37.000Z

这里是实现了Riemannian Adaptive Optimization Methods中Figure.1 提到的算法，其保证了收敛性。当然也可以尝试exp map，或许表现更优，欢迎开发~

似乎实现的不太对，你代码中m和\tao没做区分

Answer 4 · 2021-10-26T07:50:22.000Z

还有一个问题，根据论文假设，二阶矩每个分量要单独计算，不能一起算

Answer 5 · 2021-10-26T09:31:52.000Z

似乎实现的不太对，你代码中m和\tao没做区分

Actually, the Identity function is the simplest isometry. In practice, we notice that the identity function could provide acceptable accuracy, which also brings good training efficiency. We agree that complicated isometry may lead to higher performance on specific tasks. Hope that further investigation can be conducted upon this base implementation :) Thanks for pointing out this issue.

Answer 6 · 2021-10-26T09:41:53.000Z

还有一个问题，根据论文假设，二阶矩每个分量要单独计算，不能一起算

Please refer to the last paragraph on page 5 of the paper, which claims that under certain conditions (e.g, the simplest condition) the secondary moment can be computed together.

Answer 7 · 2021-10-27T09:33:34.000Z

Problem solved.

Answer 8 · 2021-10-27T10:01:01.000Z

Thanks for pointing out the possible issues.
For efficiency, we implemented \varphi as the identity function and optimize each submanifold equally.
We will fix it in later version to keep track of the paper.