numpy/numpy-financial

Bizarre linalg errors with IRR

benkuhn opened this issue · 3 comments

The following code throws an error:

>>> lst = [-3000.0, 2.3926932267015667e-07, 4.1672087103345505e-16, 5.3965110036378706e-25, 5.1962551071806174e-34, 3.7202955645436402e-43, 1.9804961711632469e-52, 7.8393517651814181e-62, 2.3072565113911438e-71, 5.0491839233308912e-81, 8.2159177668499263e-91, 9.9403244366963527e-101, 8.942410813633967e-111, 5.9816122646481191e-121, 2.9750309031844241e-131, 1.1002067043497954e-141, 3.0252876563518021e-152, 6.1854121948207909e-163, 9.4032980015353301e-174, 1.0629218520017728e-184, 8.9337141847171845e-196, 5.5830607698467935e-207, 2.5943122036622652e-218, 8.9635842466507006e-230, 2.3027710094332358e-241, 4.3987510596745562e-253, 6.2476630372575209e-265, 6.598046841695288e-277, 5.1811095266842017e-289, 3.0250999925830644e-301, 1.3133070599585015e-313]
>>> np.irr(lst)
...
LinAlgError: Array must not contain infs or NaNs

Bizarrely, if you change the e-313 to e-300 or e-330, the problem goes away on its own:

>>> np.irr(lst[:30] + [1.31330705996e-300])
-0.9999999990596069

>>> np.irr(lst[:30] + [1.31330705996e-330])
-0.9999999990596069

I don't have enough numerics-fu to know what's going on here, unfortunately. For now I'm happy to round the array to 10 decimal places, which works around it, but thought I'd file a bug since the behavior is pretty weird.

Confirmed.

1.31e-313 is a subnormal number. If you change it to 'e-300' this is a
normal number, and in double precision, 'e-330' is a fancy way to write
zero.

Something doesn't appear to like the subnormal, since you get a
LinAlgError, I'm betting on LAPACK.

On Friday, October 31, 2014, Ben Kuhn notifications@github.com wrote:

The following code throws an error:

lst = [-3000.0, 2.3926932267015667e-07, 4.1672087103345505e-16, 5.3965110036378706e-25, 5.1962551071806174e-34, 3.7202955645436402e-43, 1.9804961711632469e-52, 7.8393517651814181e-62, 2.3072565113911438e-71, 5.0491839233308912e-81, 8.2159177668499263e-91, 9.9403244366963527e-101, 8.942410813633967e-111, 5.9816122646481191e-121, 2.9750309031844241e-131, 1.1002067043497954e-141, 3.0252876563518021e-152, 6.1854121948207909e-163, 9.4032980015353301e-174, 1.0629218520017728e-184, 8.9337141847171845e-196, 5.5830607698467935e-207, 2.5943122036622652e-218, 8.9635842466507006e-230, 2.3027710094332358e-241, 4.3987510596745562e-253, 6.2476630372575209e-265, 6.598046841695288e-277, 5.1811095266842017e-289, 3.0250999925830644e-301, 1.3133070599585015e-313]
np.irr(lst)
...
LinAlgError: Array must not contain infs or NaNs

Bizarrely, if you change the e-313 to e-300 or e-330, the problem goes
away on its own:

np.irr(lst[:30] + [1.31330705996e-300])
-0.9999999990596069

np.irr(lst[:30] + [1.31330705996e-330])
-0.9999999990596069

I don't have enough numerics-fu to know what's going on here,
unfortunately. For now I'm happy to round the array to 10 decimal places,
which works around it, but thought I'd file a bug since the behavior is
pretty weird.


Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/5252.

In np.irr, it is finding the roots of a polynomial using its companion matrix. In the process of forming the companion matrix, it is dividing by the last nonzero entry in your list. That causes a value of 'inf' in the array passed to np.linalg.eigvals. If the last number is a little larger, the 'inf' is no longer present, and the eigenvalue computation can proceed. If the number is a little smaller, it is considered to be exactly equal to 0 (since floating point numbers can't store exponents that small), and is not included in the computation. The error is raised when, prior to computing the eigenvalues, numpy.linalg checks that the array is finite.

The zero-checking is happening at https://github.com/numpy/numpy/blob/master/numpy/lib/polynomial.py#L211.

I'm not really sure how to fix it. The only possible solution I see here is considering the denormal numbers as equal to 0 when running the rootfinding algorithm. That could cause an extra performance hit though. Really it's a problem inherent to the data.