lisa-lab/pylearn2

Infinite loops in BatchGradientDescent with flat constraints

AlexisMignon opened this issue · 3 comments

Hello,

I'm encountering an issue using BatchGradientDescent with param_constrainers.

Here is an example of the issue:

from pylearn2.optimization.batch_gradient_descent import BatchGradientDescent
import numpy as np
import theano
import theano.tensor as T

x = theano.shared(np.float32(1.0)) # my config uses GPU with float32
def non_negative_x(updates):
     x_update = updates[x]
     updates[x] = T.maximum(0, x_update)

cost = (x + 2) ** 2

bg = BatchGradientDescent(cost, [x], param_constrainers=[non_negative_x], verbose=1)
bg.minimize()

This leads to an infinite loop. Examining the output obtained with 'verbose=1', shows that the step size becomes infinite.

In the file batch_gradient_descent.py, line 353, one reads:

                    # Use <= rather than = so if there are ties
                    # the bigger step size wins
                    if obj <= best_obj:
                        best_obj = obj
                        best_alpha = alpha
                        best_alpha_ind = ind

I guess the problems comes from the '<=' instead of '<'. The comment seems to claim it is better to take the greater step size when ties occur, but in this case all values for step sizes bigger than some value lead to ties.

A simple correction would be to replace '<=' by '<', but since there is a special comment about this point, I am wondering if there is a strong reason not to do so.

According to the history, this was changed by @goodfeli in f37e844.
Hopefully he can give more insight on what the best way to handle that would be.

I think your princess is in another castle. Line 353 is not inside a while loop. It is inside a for loop over a fixed length list of step sizes: https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/optimization/batch_gradient_descent.py#L347

So if you're getting an infinite step size, it's not as part of that particular loop, or it's because someone put an infinite step size into that list. It's not a result of how we break ties within that loop.

The problem comes from a side effect of the condition in line 347:

                for ind, alpha in enumerate(alpha_list):
                    self._goto_alpha(alpha)
                    obj = self.obj(*inputs)
                    if self.verbose:
                        logger.info('\t{0} {1}'.format(alpha, obj))

                    # Use <= rather than = so if there are ties
                    # the bigger step size wins
                    if obj <= best_obj:
                        best_obj = obj
                        best_alpha = alpha
                        best_alpha_ind = ind

which allows to keep growing the step size even though there is no improvement, and the condition in line 377:

                elif best_alpha_ind > len(alpha_list) - 2:
                    alpha_list = [alpha * 2. for alpha in alpha_list]
                    if self.verbose:
                        logger.info('growing the step size')

which says that if the last step size of the list was used then all step sizes in the list should be multiplied by two.

In the case of a flat function there is no way to make the algorithm stop. I see at least two solutions:

  1. Go back to the strict comparison
  2. Specifically check that the function did not improve over the last few iterations