Lanczos initial value shouldn't be zero

Question

Lanczos initial value shouldn't be zero

Vaibhavdixit02 opened this issue 7 months ago · 8 comments

I've quickly checked this, it looks like a bug. step_solver!(dmp::AbstractManoptProblem{<:TangentSpace}, ls::LanczosState, i) with i == 1 gets a zero-vector in ls.X and we have division by 0. I'm not familiar with this solver so I don't know where this case should be handled.

From SciML/Optimization.jl#763 (comment)

Answer 1 · 2024-06-04T05:04:00.000Z

We should first turn this into a reproducible code here; this way anyone who wants to take a look first spends 30 minutes decoupling the example from your interface.

Answer 2 · 2024-06-04T05:14:09.000Z

Because for best of trying to extract the example I get

using Manopt, Manifoldsa
x0 = zeros(2)
p = [1.0, 100.0]

M = Euclidean(2)

rosenbrock(x, p) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
function rosenbrock_grad!(storage, x, p)
    storage[1] = -2.0 * (p[1] - x[1]) - 4.0 * p[2] * (x[2] - x[1]^2) * x[1]
    storage[2] = 2.0 * p[2] * (x[2] - x[1]^2)
end

f(M,q) = rosenbrock(q, p)
grad_f!(M, X, q) = rosenbrock_grad!(X, q, p)

adaptive_regularization_with_cubics(M, f, grad_f!, x0; evaluation=InplaceEvaluation(), debug=[:Iteration, :Iterate, :Cost, :Change, :GradientNorm,"\n",:Stop])

which nicely works and prints

Initial f(x): 1.000000
# 1     p: [0.15463069836819573, -8.948682504013403e-6]f(x): 0.771864Last Change: 0.154631|grad f(p)|:4.788582562341472
# 2     p: [0.24017049946593189, 0.04867068904225202]f(x): 0.585461Last Change: 0.098421|grad f(p)|:1.9172200558228285
# 3     p: [0.43967073785005883, 0.15268457096350788]f(x): 0.479014Last Change: 0.224987|grad f(p)|:10.114757347168405
# 4     p: [0.49560341946895287, 0.24216515151621215]f(x): 0.255611Last Change: 0.105524|grad f(p)|:0.7633854071043776
# 5     p: [0.49560341946895287, 0.24216515151621215]f(x): 0.255611Last Change: 0.000000|grad f(p)|:0.7633854071043776
# 6     p: [0.49560341946895287, 0.24216515151621215]f(x): 0.255611Last Change: 0.000000|grad f(p)|:0.7633854071043776
# 7     p: [0.49560341946895287, 0.24216515151621215]f(x): 0.255611Last Change: 0.000000|grad f(p)|:0.7633854071043776
# 8     p: [0.6619584402681458, 0.4094028797182764]f(x): 0.197136Last Change: 0.235887|grad f(p)|:9.021776250575371
# 9     p: [0.7056305691619287, 0.49577576368412024]f(x): 0.087111Last Change: 0.096786|grad f(p)|:0.4280075678174963
# 10    p: [0.7056305691619287, 0.49577576368412024]f(x): 0.087111Last Change: 0.000000|grad f(p)|:0.4280075678174963
# 11    p: [0.8597326554410162, 0.7150596261228528]f(x): 0.077663Last Change: 0.268017|grad f(p)|:9.338360241135486
# 12    p: [0.8830117721781882, 0.7791462313523332]f(x): 0.013718Last Change: 0.068184|grad f(p)|:0.11799863461264715
# 13    p: [0.9830128382412764, 0.9562966250460285]f(x): 0.010324Last Change: 0.203427|grad f(p)|:4.388981188412541
# 14    p: [0.9886860261786643, 0.9774684553936988]f(x): 0.000128Last Change: 0.021919|grad f(p)|:0.011939949794238172
# 15    p: [0.9999605841035202, 0.9997931493373116]f(x): 0.000002Last Change: 0.025010|grad f(p)|:0.05718017062672542
# 16    p: [1.0000003821964834, 1.0000007699623479]f(x): 0.000000Last Change: 0.000211|grad f(p)|:1.8389963739446183e-6
# 17    p: [1.000000003352786, 1.0000000067542727]f(x): 0.000000Last Change: 0.000001|grad f(p)|:1.6064310340716138e-8
# 18    p: [1.0000000000294509, 1.0000000000593303]f(x): 0.000000Last Change: 0.000000|grad f(p)|:1.4144281296869845e-10
The algorithm reached approximately critical point after 18 iterations; the gradient norm (1.4144281296869845e-10) is less than 1.0e-9.
2-element Vector{Float64}:
 1.0000000000294509
 1.0000000000593303

ao please help to make this reproducible as an example one can investigate.

edit: sure feel free to reopen, if problem persists and you have an example to reproduce this with Manopt. There might of course be boundary cases we missed in either ARC or Lanczos.

Answer 3 · 2024-06-04T07:32:51.000Z

I've prepared a reproducing example, though for a different function:

using Manopt, Manifolds
x0 = zeros(2)
p = [1.0, 100.0]

M = Euclidean(2)

function rosenbrock(x, p)
    if p[1] < 1
        return (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
    else
        return 0.0
    end
end

function rosenbrock_grad!(storage, x, p)
    if p[1] < 1
        storage[1] = -2.0 * (p[1] - x[1]) - 4.0 * p[2] * (x[2] - x[1]^2) * x[1]
        storage[2] = 2.0 * p[2] * (x[2] - x[1]^2)
    else
        storage .= 0
    end
end

f(M,q) = rosenbrock(q, p)
grad_f!(M, X, q) = rosenbrock_grad!(X, q, p)

adaptive_regularization_with_cubics(M, f, grad_f!, [1.0, 2.0]; evaluation=InplaceEvaluation(), debug=[:Iteration, :Iterate, :Cost, :Change, :GradientNorm,"\n",:Stop])

Answer 4 · 2024-06-04T07:38:45.000Z

Looks interesting, I can even change the start value a bit and it crashes.
It seems to be the – really a bit tricky – min_cubic_Newton that finds a zero of a cubic polynomial (special ones though).
Since this happens in the very first step, it might be a different issue from the one in the comment (for example problem in initialisation), not yet sure, it's already more than a year ago that I worked with this algorithm and its details.

Answer 5 · 2024-06-04T07:47:55.000Z

I think I found the culprit. The very first gradient evaluation (even outside of the solver at your initial point) is the zero vector – so you are at a critical point. We might have to catch that. I propose to check for a too small gradient in the first iteration or inside Lanczos and just return the point we started at.

edit: uff, this is harder than I though. Just catching X=0 does not so much help. I am confused, I would have thought if the gradient is zero after the first iteration we should stop due to a too small gradient. So the main trouble here is that ARC (or Lanczor) does not expect you to start directly at a minimiser. Sure this might by accident happen and we should catch that, I am just not yet sure how.

Answer 6 · 2024-06-04T16:54:40.000Z

Ah I now see the above problem of rosenbrock actually. Since p[1] is one and that is not less than one – both cost and grad are the zero function. Sure one can try to minimise the zero function – but I am not surprised that can fail. We can surely keep the fix still ;) But that makes the test a bit easier.

Answer 7 · 2024-06-04T19:14:54.000Z

Oh, I didn't check what exactly starts at the minimizer but the idea behind my example was to try make the algorithm jump to the flat area where the subsolver would fail.

Answer 8 · 2024-06-04T19:18:06.000Z

Since p is never varied you minimised the zero function (but you had its gradient correct); well, nevermind, it helped to fix a boundary case (exact zero gradient) in Lanczos.