pandas-dev/pandas

BUG: numexpr 2.85 changed integer overflow handling, failing a test

rebecca-palmer opened this issue · 9 comments

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

The test suite, specifically TestFrameFlexArithmetic.test_floordiv_axis0_numexpr_path[python-pow]

This does integer pow() where most of the inputs are multiples of 100 (e.g. 20100**100) and the mathematically correct result is hence a multiple of 2**100.  This is 0 mod 2**64, and plain pandas returns 0, but this example is a large enough array to use numexpr by default.

Issue Description

With numexpr 2.8.5 it instead returns -2**63, and hence the test fails.

=================================== FAILURES ===================================
483s _____ TestFrameFlexArithmetic.test_floordiv_axis0_numexpr_path[python-pow] _____
483s
483s self = <pandas.tests.frame.test_arithmetic.TestFrameFlexArithmetic object at 0x7fa71aebc1d0>
483s opname = 'pow'
483s
483s @pytest.mark.skipif(not NUMEXPR_INSTALLED, reason="numexpr not installed")
483s @pytest.mark.parametrize("opname", ["floordiv", "pow"])
483s def test_floordiv_axis0_numexpr_path(self, opname):
483s # case that goes through numexpr and has to fall back to masked_arith_op
483s op = getattr(operator, opname)
483s
483s arr = np.arange(_MIN_ELEMENTS + 100).reshape(_MIN_ELEMENTS // 100 + 1, -1) * 100
483s df = DataFrame(arr)
483s df["C"] = 1.0
483s
483s ser = df[0]
483s result = getattr(df, opname)(ser, axis=0)
483s
483s expected = DataFrame({col: op(df[col], ser) for col in df.columns})
483s > tm.assert_frame_equal(result, expected)
483s
483s /usr/lib/python3/dist-packages/pandas/tests/frame/test_arithmetic.py:510:
483s _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
483s
483s > ???
483s
483s pandas/_libs/testing.pyx:52:
483s _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
483s
483s > ???
483s E AssertionError: DataFrame.iloc[:, 0] (column name="0") are different
483s E
483s E DataFrame.iloc[:, 0] (column name="0") values are different (99.99 %)
483s E [index]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
483s E [left]: [1, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, ...]
483s E [right]: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
483s

Expected Behavior

The test should pass.

I don't know whether pandas has documented integer overflow behaviour, but if it does it should follow it.

Installed Versions

Happens with numexpr 2.8.5 and not with 2.8.4. (This is not #54449, though I do also see that bug - that's an explicit exception, this is a changed answer.)

Seen in both pandas 1.5.3 and pandas 2.0.3.

This is plausibly the result of numexpr fixing pydata/numexpr#434 : int ** int in numexpr used to explicitly fail, making us fall back to the non-numexpr method, but no longer does.

It might be possible to work around this by not using numexpr for **, but presumably with a performance cost.

pandas doesn't have an obviously documented policy on integer overflow, and numpy's policy appears to be the same as C's (i.e. signed integer overflow is undefined). This suggests treating this as undefined / not-a-bug and changing the test's input to something that doesn't overflow (suggestion, not tested: arr = np.arange(_MIN_ELEMENTS + 100).reshape(_MIN_ELEMENTS // 100 + 1, -1) % 7 - as 7 doesn't divide 100, this also gives the division test both 0/0 and non0/0 cases).

However, this test was originally added in #31296 and referencing a similar non-numexpr one in #31271 testing that column-by-column and whole-DataFrame operations gave the same result, specifically for division by zero. Is that an implied policy that column-by-column and whole-DataFrame results must always be the same (and hence that numexpr and non-numexpr results must be the same, as it is easily possible for the whole frame to be big enough to default to numexpr while a single column is not)?

No reason was stated there for the power test using almost-all-overflowing input; it may have been copied from the division test.

@jbrockmendel ?

Is that an implied policy that column-by-column and whole-DataFrame results must always be the same

Yes, these should always match.

No reason was stated there for the power test using almost-all-overflowing input; it may have been copied from the division test.

No idea off the top of my head.

pandas doesn't have an obviously documented policy on integer overflow

I don't think this is non-obviously documented either, so "policy" is probably too strong a word. In general we'd rather raise than silently give incorrect results. What constitutes "incorrect" depends on if the user is expected modular arithmetic. My rule of thumb is that with signed int64 they are not, and with other integer dtypes they plausibly could be. That may just be me though.

What constitutes "incorrect" depends on if the user is expected modular arithmetic. My rule of thumb is that with signed int64 they are not, and with other integer dtypes they plausibly could be. That may just be me though.

The C (and numpy?) policy is that unsigned ints must wrap (modular arithmetic) but signed ints (of any width) can do anything.

Is that an implied policy that column-by-column and whole-DataFrame results must always be the same

Yes, these should always match.

Does that mean that while signed ints can do anything on overflow, they must do the same thing in both cases, and hence that we can't use numexpr where its overflow behaviour doesn't match plain pandas? If so, we can't use numexpr for int ** int. (This didn't come up before because numexpr int ** int used to explicitly fail (in all cases) due to bug pydata/numexpr#434.)

I don't know whether the other operators currently follow this rule: this test only tests overflowing input for pow, and I don't know what either numexpr or plain pandas do on overflow of other operations.

The non-pow operators (at least add, sub, mul) have wraparound overflow in numexpr and hence don't have this issue - it seems only pow gives -2**63 on overflow.

It looks like the reason that pow is different from the other numexpr operators is that it's implemented by casting to double and back.

The first part of this is an attempt to stop using it, but it does both more and less than we actually want: it disables numexpr pow in all dtypes (not just ints), but only in plain operators (not .eval()).