gonum/optimize

Update of OptLoc on every iteration causes flaky optimization runs

Closed this issue · 11 comments

Reproducing case: http://play.golang.org/p/JDn004ekkD

The Rastrigin function is many modal. In one of the bisection searches, the linesearch jumps over the closest minimum, and finds a point in the next valley. Bisecting between these two locations, the linesearch finds the minimum closest to the nearest point, which has a lower value than the origin point of the linesearch, but a higher value than the other valley found. In numbers, the initial point is

201.32956023853234
-252.10162965126193

The bounding point found is

F = 89.99490432365724
ProjG = 294.0993005447343

While the function value is low, the gradient is high, and so the Wolfe conditions are not satisfied. The linesearch continues, eventually finding

f =  116.94945683564991
d =  127.80472382674293

This point satisfies the linesearch conditions. The optimization continues, eventually finding the local minimum in this trough, which has a function value higher than 90. However, optLoc was updated to the 90 location, which has a high gradient value, and the optimization never converges.

On the one hand, this is clearly suboptimal behavior by Bisection, and it should be improved, but at the same time this should be acceptable behavior for a Linesearch method. A local minimum was correctly found, and it should converge when the low gradient location was found.

I have previously argued for OptLoc, but I did not consider this case. Something else needs to be done.

I'm not sure what the correct fix is. In gradient-based methods, every new Major iteration has a (weakly) lower function value. The one catch is if the number of function evaluations is reached in the middle of a linesearch, and then Local should return the better point of the last major iteration and the current location (and thus storing OptLoc still has merit). Yea, it's unfortunate to lose the better value returned, but the point is to find a local minimum and that was found.

This doesn't work as smoothly for gradient-free optimization. The start of every MajorIteration does not necessarily indicate a better function value. With NelderMead, for instance, it could have its best location at the first step, and then shrink until convergence, never finding a better value. Here it seems like we should return the best value found, because there is no guarantee of getting successfully better function values.

Ideas? @vladimir-ch

Now we check for improvement and update OptLoc at every (Major|Minor|Sub)Iteration. Wouldn't updating of OptLoc at only MajorIteration solve the issue? We would miss troughs leading to lower minima but that is inevitable anyway.

It's not so clear. Take for example Nelder Mead. When it shrinks the vertexes, it gets a whole bunch of new points in one sub-iteration. One of those may be the new minimum, but it may not be the most recent evaluation. This would be a good use case for the location being different and requesting NoEvaluation, but breaks the invariant that Location is only a one-way communication to Method.

My understanding of the meaning of MajorIteration is that the method has made some (unspecified) progress and that it is time to check for convergence. Linesearch is lucky that the location for MajorIteration coincides with the location from MinorIteration. From this point of view, Nelder-Mead would have to keep track of the best location internally and use it when announcing major iteration.

Based on the current abstraction, there is no mechanism for the Method to announce such information. Right now, Method is supposed to receive information from Location and fill in nextLocation (it is not supposed to modify Location). Nelder-Mead would have to modify Location (and Local would have to be able to interpret such modifications) in order to do such a form of announcing.

You do hint at the broader problem which is that we (eventually) need to decide what the various iteration types actually mean. I think it's okay that we don't have a formal definition yet (as it shifts as we discover things), but we should be keeping that in mind.

Of course, you are right, I didn't realize that that would mean modifying Location. I'll ponder it further.

Agreed on the meaning of iteration types.

We could solve the issue by defining the meaning of iteration types. Just thinking aloud. We could have four iteration types as we have now with clear meaning and actions taken when received (without caring about the names for the moment):

  • MajorIteration: location is complete, can be considered for storing in optloc and convergence should be checked (i.e., just as it is now)
  • MinorIteration: location is complete, can be considered for storing in optloc, but convergence should not be checked yet (i.e., Linesearch would have to be changed, Nelder-Mead would use this iteration)
  • SubIteration: location is not complete (Linesearch would use this iteration during line search and for evaluating invalid fields in location)
  • NoIteration: same as now

It is slightly artificial and tailored for the current Methods in optimize, but it is an option nonetheless.

Another option is to take the ORable approach as with the function evaluations. Methods could compose the iteration type from some elementary types that would indicate what should/can be done with the location.

I wonder if I should have closed this with #123, @btracey ?

Yes, I think so. MajorIteration has a clear meaning, and the others are gone.

Fixed by #123