TomographicImaging/CIL

Change in behaviour Armijio rule

MargaretDuff opened this issue · 3 comments

Description

When running demo 2.1 release 23.1.0 with the new implementation of the Armijio rule, it doesn't seem to work, despite the fact that backwards compatibility should be maintained by PR #1768.

Should check that a bug hasn't been introduced

Environment

import cil, sys
print(cil.version.version, cil.version.commit_hash, sys.version, sys.platform)

So investigating, the "old" Armijo implementation, for each iteration it initialises a step size, tests the step size, either returns or reduces the step size and repeats. In the next iteration, the step size is initialised with the step size from the last iteration.

In the new implementation, for each iteration, it initialises a step size, tests the step size, either returns or reduces the step size and repeats. In the next iteration, the step size is initialised the original starting step size.

Thoughts:

  • The old implementation will in most cases require fewer armijo rule evaluations and will be quicker
  • The old implementation is less likely to fail - e.g. what we have seen in the iterative notebooks
  • The new implementation allows for step sizes to increase so that if you start in a particularly difficult place but then the landscape gets easier, you can take bigger steps.

Potential solutions

I favour the second option

@paskino @jakobsj - it would be good to have your opinion on this

For context, the Armijo rule requires the evaluation of the objective function for each iteration

while k < self.max_iterations:
algorithm.gradient_update.multiply(self.alpha, out=self.x_armijo)
algorithm.solution.subtract(self.x_armijo, out=self.x_armijo)
f_x_a = algorithm.objective_function(self.x_armijo)

This is costly, as for LeastSquares it entails the call of the direct of the operator, which needs to be added to the other calls in the algorithm itself.

def __call__(self, x):
r""" Returns the value of :math:`F(x) = c\|Ax-b\|_2^2` or :math:`c\|Ax-b\|_{2,W}^2`, where :math:`W=\text{diag}(weight)`:
"""
# c * (A.direct(x)-b).dot((A.direct(x) - b))
y = self.A.direct(x)

So, calling the Armijo rule n times to find the best step size, we have to do n+1 calls to the evaluation of the objective function (+1 because there is an initial).

The new behaviour may suggest that we can limit the iterations in the Armijo rule. I am in favour of the new implementation, i.e. starting alpha from the previous one, with flag to enable restart from alpha=1e6 if it fails to find a step size.

Thanks @paskino, that agrees with what I was thinking and have implanted in #1934