ogrisel/pygbm

numba-integration-test failure

esc opened this issue · 6 comments

esc commented

The numba-integration-tests failed and I am trying to figure out, if it is a legit failure or if we introduced a new bug in Numba

The failure is

=================================== FAILURES ===================================
_________________ test_derivatives[binary_crossentropy-0.3-0] __________________

loss = <pygbm.loss.BinaryCrossEntropy object at 0x7f4f2853da50>
x0 = array([[0.3]], dtype=float32), y_true = array([0.], dtype=float32)

    @pytest.mark.parametrize('loss, x0, y_true', [
        ('least_squares', -2., 42),
        ('least_squares', 117., 1.05),
        ('least_squares', 0., 0.),
        ('binary_crossentropy', 0.3, 0),
        ('binary_crossentropy', -12, 1),
        ('binary_crossentropy', 30, 1),
    ])
    def test_derivatives(loss, x0, y_true):
        # Check that gradients are zero when the loss is minimized on 1D array
        # using the Newton-Raphson and the first and second order derivatives
        # computed by the Loss instance.
    
        loss = _LOSSES[loss]()
        y_true = np.array([y_true], dtype=np.float32)
        x0 = np.array([x0], dtype=np.float32).reshape(1, 1)
        get_gradients, get_hessians = get_derivatives_helper(loss)
    
        def func(x):
            return loss(y_true, x)
    
        def fprime(x):
            return get_gradients(y_true, x)
    
        def fprime2(x):
            return get_hessians(y_true, x)
    
>       optimum = newton(func, x0=x0, fprime=fprime, fprime2=fprime2)

tests/test_loss.py:78: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = <function test_derivatives.<locals>.func at 0x7f4f0c4e6440>
x0 = array([[0.3]], dtype=float32)
fprime = <function test_derivatives.<locals>.fprime at 0x7f4f0c4e67a0>
args = (), tol = 1.48e-08, maxiter = 50
fprime2 = <function test_derivatives.<locals>.fprime2 at 0x7f4f0c4e65f0>
x1 = None, rtol = 0.0, full_output = False, disp = True

    def newton(func, x0, fprime=None, args=(), tol=1.48e-8, maxiter=50,
               fprime2=None, x1=None, rtol=0.0,
               full_output=False, disp=True):
        """
        Find a zero of a real or complex function using the Newton-Raphson
        (or secant or Halley's) method.
    
        Find a zero of the function `func` given a nearby starting point `x0`.
        The Newton-Raphson method is used if the derivative `fprime` of `func`
        is provided, otherwise the secant method is used.  If the second order
        derivative `fprime2` of `func` is also provided, then Halley's method is
        used.
    
        If `x0` is a sequence with more than one item, then `newton` returns an
        array, and `func` must be vectorized and return a sequence or array of the
        same shape as its first argument. If `fprime` or `fprime2` is given then
        its return must also have the same shape.
    
        Parameters
        ----------
        func : callable
            The function whose zero is wanted. It must be a function of a
            single variable of the form ``f(x,a,b,c...)``, where ``a,b,c...``
            are extra arguments that can be passed in the `args` parameter.
        x0 : float, sequence, or ndarray
            An initial estimate of the zero that should be somewhere near the
            actual zero. If not scalar, then `func` must be vectorized and return
            a sequence or array of the same shape as its first argument.
        fprime : callable, optional
            The derivative of the function when available and convenient. If it
            is None (default), then the secant method is used.
        args : tuple, optional
            Extra arguments to be used in the function call.
        tol : float, optional
            The allowable error of the zero value.  If `func` is complex-valued,
            a larger `tol` is recommended as both the real and imaginary parts
            of `x` contribute to ``|x - x0|``.
        maxiter : int, optional
            Maximum number of iterations.
        fprime2 : callable, optional
            The second order derivative of the function when available and
            convenient. If it is None (default), then the normal Newton-Raphson
            or the secant method is used. If it is not None, then Halley's method
            is used.
        x1 : float, optional
            Another estimate of the zero that should be somewhere near the
            actual zero.  Used if `fprime` is not provided.
        rtol : float, optional
            Tolerance (relative) for termination.
        full_output : bool, optional
            If `full_output` is False (default), the root is returned.
            If True and `x0` is scalar, the return value is ``(x, r)``, where ``x``
            is the root and ``r`` is a `RootResults` object.
            If True and `x0` is non-scalar, the return value is ``(x, converged,
            zero_der)`` (see Returns section for details).
        disp : bool, optional
            If True, raise a RuntimeError if the algorithm didn't converge, with
            the error message containing the number of iterations and current
            function value.  Otherwise the convergence status is recorded in a
            `RootResults` return object.
            Ignored if `x0` is not scalar.
            *Note: this has little to do with displaying, however
            the `disp` keyword cannot be renamed for backwards compatibility.*
    
        Returns
        -------
        root : float, sequence, or ndarray
            Estimated location where function is zero.
        r : `RootResults`, optional
            Present if ``full_output=True`` and `x0` is scalar.
            Object containing information about the convergence.  In particular,
            ``r.converged`` is True if the routine converged.
        converged : ndarray of bool, optional
            Present if ``full_output=True`` and `x0` is non-scalar.
            For vector functions, indicates which elements converged successfully.
        zero_der : ndarray of bool, optional
            Present if ``full_output=True`` and `x0` is non-scalar.
            For vector functions, indicates which elements had a zero derivative.
    
        See Also
        --------
        brentq, brenth, ridder, bisect
        fsolve : find zeros in n dimensions.
    
        Notes
        -----
        The convergence rate of the Newton-Raphson method is quadratic,
        the Halley method is cubic, and the secant method is
        sub-quadratic.  This means that if the function is well behaved
        the actual error in the estimated zero after the n-th iteration
        is approximately the square (cube for Halley) of the error
        after the (n-1)-th step.  However, the stopping criterion used
        here is the step size and there is no guarantee that a zero
        has been found. Consequently the result should be verified.
        Safer algorithms are brentq, brenth, ridder, and bisect,
        but they all require that the root first be bracketed in an
        interval where the function changes sign. The brentq algorithm
        is recommended for general use in one dimensional problems
        when such an interval has been found.
    
        When `newton` is used with arrays, it is best suited for the following
        types of problems:
    
        * The initial guesses, `x0`, are all relatively the same distance from
          the roots.
        * Some or all of the extra arguments, `args`, are also arrays so that a
          class of similar problems can be solved together.
        * The size of the initial guesses, `x0`, is larger than O(100) elements.
          Otherwise, a naive loop may perform as well or better than a vector.
    
        Examples
        --------
        >>> from scipy import optimize
        >>> import matplotlib.pyplot as plt
    
        >>> def f(x):
        ...     return (x**3 - 1)  # only one real root at x = 1
    
        ``fprime`` is not provided, use the secant method:
    
        >>> root = optimize.newton(f, 1.5)
        >>> root
        1.0000000000000016
        >>> root = optimize.newton(f, 1.5, fprime2=lambda x: 6 * x)
        >>> root
        1.0000000000000016
    
        Only ``fprime`` is provided, use the Newton-Raphson method:
    
        >>> root = optimize.newton(f, 1.5, fprime=lambda x: 3 * x**2)
        >>> root
        1.0
    
        Both ``fprime2`` and ``fprime`` are provided, use Halley's method:
    
        >>> root = optimize.newton(f, 1.5, fprime=lambda x: 3 * x**2,
        ...                        fprime2=lambda x: 6 * x)
        >>> root
        1.0
    
        When we want to find zeros for a set of related starting values and/or
        function parameters, we can provide both of those as an array of inputs:
    
        >>> f = lambda x, a: x**3 - a
        >>> fder = lambda x, a: 3 * x**2
        >>> np.random.seed(4321)
        >>> x = np.random.randn(100)
        >>> a = np.arange(-50, 50)
        >>> vec_res = optimize.newton(f, x, fprime=fder, args=(a, ))
    
        The above is the equivalent of solving for each value in ``(x, a)``
        separately in a for-loop, just faster:
    
        >>> loop_res = [optimize.newton(f, x0, fprime=fder, args=(a0,))
        ...             for x0, a0 in zip(x, a)]
        >>> np.allclose(vec_res, loop_res)
        True
    
        Plot the results found for all values of ``a``:
    
        >>> analytical_result = np.sign(a) * np.abs(a)**(1/3)
        >>> fig = plt.figure()
        >>> ax = fig.add_subplot(111)
        >>> ax.plot(a, analytical_result, 'o')
        >>> ax.plot(a, vec_res, '.')
        >>> ax.set_xlabel('$a$')
        >>> ax.set_ylabel('$x$ where $f(x, a)=0$')
        >>> plt.show()
    
        """
        if tol <= 0:
            raise ValueError("tol too small (%g <= 0)" % tol)
        if maxiter < 1:
            raise ValueError("maxiter must be greater than 0")
        if np.size(x0) > 1:
            return _array_newton(func, x0, fprime, args, tol, maxiter, fprime2,
                                 full_output)
    
        # Convert to float (don't use float(x0); this works also for complex x0)
        p0 = 1.0 * x0
        funcalls = 0
        if fprime is not None:
            # Newton-Raphson method
            for itr in range(maxiter):
                # first evaluate fval
                fval = func(p0, *args)
                funcalls += 1
                # If fval is 0, a root has been found, then terminate
                if fval == 0:
                    return _results_select(
                        full_output, (p0, funcalls, itr, _ECONVERGED))
                fder = fprime(p0, *args)
                funcalls += 1
                if fder == 0:
                    msg = "Derivative was zero."
                    if disp:
                        msg += (
                            " Failed to converge after %d iterations, value is %s."
                            % (itr + 1, p0))
>                       raise RuntimeError(msg)
E                       RuntimeError: Derivative was zero. Failed to converge after 46 iterations, value is [[-89.88232]].

../miniconda3/envs/pygbm/lib/python3.7/site-packages/scipy/optimize/zeros.py:294: RuntimeError

And the full log is:

https://circleci.com/gh/numba/numba-integration-testing/1120

It's not numba, it's a scipy version issue (tests pass with scipy 1.1.0)

esc commented

O.K. should I just wait for that version Scipy to hit anaconda.org?

1.1.0 is a previous version.

This may be a bug in the latest scipy, but TBH I don't have time right now to track it down. Maybe your best bet for now is to pin the scipy version to 1.1.0 when you test pygbm?

Also @esc , mostly unrelated, but I've been having fun lately with this numba-based HMM implementation https://github.com/NicolasHug/hmmkay

It's pretty basic (and I don't know how far I want to go maintaining it) but maybe you'd want to test against this one too, just in case.

esc commented

@NicolasHug thanks for the info! Is there an open ticket in SciPy somewhere? Is this possibly something that needs a hotfix from SciPy?

esc commented

Indeed: numba/numba-integration-testing#27 - although an upstream solution would --- of course --- be preferable.