lezcano/expRNN

About a statement in the paper

Closed this issue · 3 comments

Hi, thank you for your great code.
I am reading the paper. It is quite interesting!
I have a question w.r.t. not the code but the paper.
While most of the derivations are understandable to me, I cannot understand the following statement:
“If the parametrization did not induce a change of metric on the manifold it could mean that it would induce saddle points”
I see its importance of introducing a metric that makes the landscape be without saddle points, but how can I derive this property from this assumption?
Thank you in advance:)

Hi Petroskey,

I am glad you are interested in the paper!!

So, to see that if a parametrization does not induce a change of metric, then it could induce saddle points, then we just have to show an example of this behaviour. Consider parametrizing the interval [-1, 1] via the sin function. Then, for a function f: [-1, 1] -> R, we have
f \circ sin : R -> R
and its gradient (derivative in this case)
(f \circ sin)'(x) = f'(sin(x))cos(x).
This derivative is zero in the points where f' is zero or in the points where cos(x) = 0. So, it could be that f' is not zero in, say x = pi/2, but (f \circ sin)' (pi/2) = 0. This would be an artificial extremal point created by the parametrization (local minima, local maxima or saddle point).

Note that these are the points on which the function sin(x) fails to be a local diffeomorphism, and as such, it does not act as a change of metric. The same reasoning can be extended to higher dimensions, but rather than looking at the derivative of the parametrisation, we would be looking at the differential of the parametrization, and looking at whether it is full rank.

In the future, for questions regarding papers, feel free to pop me an email. You can find the email in the paper itself :)

Thank you for your kind reply and answer, Lezcano.
Thanks to your explanation, I see how saddle points occur.
So in general, your statement says that if the derivative of a reparametrization is full rank, then the function in target will not incorporate more saddle points (or zero-gradient points).
I am grad if I grasp what you mean correctly.
Thank you very much for your help!

Exactly!
Again, happy you liked the paper :)