pydata/numexpr

pow with integer arrays that overflow differs from numpy in 2.8.7

mroeschke opened this issue · 5 comments

It appears that pow was not implemented in 2.8.4 and newly implemented(?) in 2.8.7, but it appears that pow that overflows has differing behavior from numpy

In [1]: import numpy as np

In [2]: import numexpr as ne

In [3]: arr = np.array([10000, 20000])

In [4]: arr**arr
Out[4]: array([0, 0])

In [5]: ne.evaluate('arr**arr')  # '2.8.4'
TypeError: '<' not supported between instances of 'str' and 'int'

In [5]: ne.evaluate('arr**arr')  # '2.8.7'
Out[5]: array([9223372036854775807, 9223372036854775807], dtype=int64)

You can see the context here for the changes were made:

#434

2.8.4 was ignoring the virtual machine (VM) and short-cutting the calculation, but the implementation was buggy. Someone at some point, put in a check implemented a > 0 check for the entire input array in the pre-processing for integer power, which is totally against the whole design objective of NumExpr.

I would argue that the NumPy behavior is also wrong. NumPy raises an exception if you feed a negative exponent to an integer base. It should also error in this case since the answer is clearly not 0.

The VM doesn't have any faculty to, for example, set an error flag that would result in an exception to be raised after the calculation is completed.

Agreed that the numpy result is also not correct/useful.

So in general for numexpr, any overflowing expression is expected to truncate at the max value of the specified data type?

How NumExpr will behave in this situation will depend on the CPU architecture I think... C++ doesn't specify a behavior for integer overflow conditions so it's undefined.

The other operations don't - pow is special because it's implemented by casting to double and back. See pandas-dev/pandas#54546.

Message to comment on stale issues. If none provided, will not mark issues stale