boostorg/math

Noncentral F distribution CDF edge cases

Opened this issue · 3 comments

SciPy is thanks to Boost finally able to remove parts of the ancient cdflib Fortran library. Recently, the noncentral F distribution CDF was fully replaced by the boost implementation in scipy/scipy#21505, resulting in a far more accurate results for a wide range of inputs. Still, edge cases were uncovered which we would like to share with you. They occur for very large and very tiny values for the degrees of freedom. Note that for the second case even Mathematica is not able to compute a result, so maybe we do not need to thrive for a feasible value but raise an error or return nan. @mdhaber found a limiting formula that might be helpful: scipy/scipy#21505 (comment)

v1 v2 l x Boost result
1e-100 3 1.5 1e100 nan
1e20 1e20 1 1 -124.40303872682864

Reproduced, first one is an easy fix. Not so sure about the second yet...

Update: working on this now, there are some easy to fix numerical-stability issues which sort of fix both, but ideally we need some asymptotic expansions for the incomplete beta with large parameters: these exist, and I have them somewhat working, but they are deeply numerically unstable, especially near the saddle point. Hopefully more soon...

FYI: We found a likely related edge for the noncentral t distribution CDF in scipy/scipy#21728 (comment) .

For df=980, noncentrality=38, x=1.5 boost computes a negative value of the order of -1e-70.