Change in behaviour Armijio rule

Description

When running demo 2.1 release 23.1.0 with the new implementation of the Armijio rule, it doesn't seem to work, despite the fact that backwards compatibility should be maintained by PR #1768.

Should check that a bug hasn't been introduced

Environment

import cil, sys
print(cil.version.version, cil.version.commit_hash, sys.version, sys.platform)

So investigating, the "old" Armijo implementation, for each iteration it initialises a step size, tests the step size, either returns or reduces the step size and repeats. In the next iteration, the step size is initialised with the step size from the last iteration.

In the new implementation, for each iteration, it initialises a step size, tests the step size, either returns or reduces the step size and repeats. In the next iteration, the step size is initialised the original starting step size.

Thoughts:

The old implementation will in most cases require fewer armijo rule evaluations and will be quicker
The old implementation is less likely to fail - e.g. what we have seen in the iterative notebooks
The new implementation allows for step sizes to increase so that if you start in a particularly difficult place but then the landscape gets easier, you can take bigger steps.

Potential solutions

Return to the previous behaviour
Have an option to restart or not, potentially default is the previous behaviour
Have an option for a 2-way backtracking e.g. https://en.wikipedia.org/wiki/Backtracking_line_search#Time_efficiency

I favour the second option

@paskino @jakobsj - it would be good to have your opinion on this

For context, the Armijo rule requires the evaluation of the objective function for each iteration

CIL/Wrappers/Python/cil/optimisation/utilities/StepSizeMethods.py

Lines 125 to 130 in a181c35

    
           while k < self.max_iterations: 
        
               algorithm.gradient_update.multiply(self.alpha, out=self.x_armijo) 
        
               algorithm.solution.subtract(self.x_armijo, out=self.x_armijo) 
        
               f_x_a = algorithm.objective_function(self.x_armijo)

This is costly, as for LeastSquares it entails the call of the direct of the operator, which needs to be added to the other calls in the algorithm itself.

CIL/Wrappers/Python/cil/optimisation/functions/LeastSquares.py

Lines 68 to 74 in a181c35

    
               def __call__(self, x): 
        
                   r""" Returns the value of :math:`F(x) = c\|Ax-b\|_2^2` or :math:`c\|Ax-b\|_{2,W}^2`, where :math:`W=\text{diag}(weight)`: 
        
                   """ 
        
                   # c * (A.direct(x)-b).dot((A.direct(x) - b)) 
        
                   y = self.A.direct(x)

So, calling the Armijo rule n times to find the best step size, we have to do n+1 calls to the evaluation of the objective function (+1 because there is an initial).

The new behaviour may suggest that we can limit the iterations in the Armijo rule. I am in favour of the new implementation, i.e. starting alpha from the previous one, with flag to enable restart from alpha=1e6 if it fails to find a step size.

Thanks @paskino, that agrees with what I was thinking and have implanted in #1934

	while k < self.max_iterations:

	algorithm.gradient_update.multiply(self.alpha, out=self.x_armijo)
	algorithm.solution.subtract(self.x_armijo, out=self.x_armijo)

	f_x_a = algorithm.objective_function(self.x_armijo)

	def __call__(self, x):

	r""" Returns the value of :math:`F(x) = c\\|Ax-b\\|_2^2` or :math:`c\\|Ax-b\\|_{2,W}^2`, where :math:`W=\text{diag}(weight)`:

	"""
	# c * (A.direct(x)-b).dot((A.direct(x) - b))
	y = self.A.direct(x)