Infinite loops in BatchGradientDescent with flat constraints
AlexisMignon opened this issue · 3 comments
Hello,
I'm encountering an issue using BatchGradientDescent
with param_constrainers
.
Here is an example of the issue:
from pylearn2.optimization.batch_gradient_descent import BatchGradientDescent
import numpy as np
import theano
import theano.tensor as T
x = theano.shared(np.float32(1.0)) # my config uses GPU with float32
def non_negative_x(updates):
x_update = updates[x]
updates[x] = T.maximum(0, x_update)
cost = (x + 2) ** 2
bg = BatchGradientDescent(cost, [x], param_constrainers=[non_negative_x], verbose=1)
bg.minimize()
This leads to an infinite loop. Examining the output obtained with 'verbose=1', shows that the step size becomes infinite.
In the file batch_gradient_descent.py, line 353, one reads:
# Use <= rather than = so if there are ties
# the bigger step size wins
if obj <= best_obj:
best_obj = obj
best_alpha = alpha
best_alpha_ind = ind
I guess the problems comes from the '<=' instead of '<'. The comment seems to claim it is better to take the greater step size when ties occur, but in this case all values for step sizes bigger than some value lead to ties.
A simple correction would be to replace '<=' by '<', but since there is a special comment about this point, I am wondering if there is a strong reason not to do so.
I think your princess is in another castle. Line 353 is not inside a while loop. It is inside a for loop over a fixed length list of step sizes: https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/optimization/batch_gradient_descent.py#L347
So if you're getting an infinite step size, it's not as part of that particular loop, or it's because someone put an infinite step size into that list. It's not a result of how we break ties within that loop.
The problem comes from a side effect of the condition in line 347:
for ind, alpha in enumerate(alpha_list):
self._goto_alpha(alpha)
obj = self.obj(*inputs)
if self.verbose:
logger.info('\t{0} {1}'.format(alpha, obj))
# Use <= rather than = so if there are ties
# the bigger step size wins
if obj <= best_obj:
best_obj = obj
best_alpha = alpha
best_alpha_ind = ind
which allows to keep growing the step size even though there is no improvement, and the condition in line 377:
elif best_alpha_ind > len(alpha_list) - 2:
alpha_list = [alpha * 2. for alpha in alpha_list]
if self.verbose:
logger.info('growing the step size')
which says that if the last step size of the list was used then all step sizes in the list should be multiplied by two.
In the case of a flat function there is no way to make the algorithm stop. I see at least two solutions:
- Go back to the strict comparison
- Specifically check that the function did not improve over the last few iterations