pydata/numexpr

Numexpr engine in pandas fails when using eval on a dataframe

em1208 opened this issue · 1 comments

I'm using the eval function in pandas to calculate the value of a specific expression and unfortunately numexpr fails in multiple ways when the number of elements in the expression is above 31.

I'm using the following packages:

pandas==2.2.0
numpy==1.26.4
numexpr==2.9.0

Here is the sample code to reproduce all the issues:

import pandas as pd
import numpy as np
import itertools
import string
import traceback

for num in [31, 32, 400, 650]:
    print('Number of cols:', num)
    l = list(itertools.permutations(string.ascii_uppercase, r=2))[:num]
    cols = [''.join(x) for x in l]
    formula = '+'.join(cols)
    df = pd.DataFrame(np.random.randint(0,100,size=(100, num)), columns=cols)
    print('Case python engine:', df.eval(formula, engine="python"))
    try:
        print('Case numexpr engine:', df.eval(formula, engine="numexpr"))
    except Exception as err:
        print(traceback.format_exc())
  • When the number of elements is 31 everything works fine.
  • When the number of elements is 32 numexpr raises ValueError: too many inputs
  • When the number of elements is 400 numexpr raises SyntaxError: too many nested parentheses
  • When the number of elements is 650 numexpr raises RecursionError: maximum recursion depth exceeded

When using the engine="python" there is no issue.

I have not looked at the numexpr code to figure out the issue yet but I believe the general case should be supported.

See truncated traceback below:

import pandas as pd
import numpy as np
import itertools
import string
import traceback

for num in [31, 32, 400, 650]:
    print('Number of cols:', num)
    l = list(itertools.permutations(string.ascii_uppercase, r=2))[:num]
    cols = [''.join(x) for x in l]
    formula = '+'.join(cols)
    df = pd.DataFrame(np.random.randint(0,100,size=(100, num)), columns=cols)
    print('Case python engine:', df.eval(formula, engine="python"))
    try:
        print('Case numexpr engine:', df.eval(formula, engine="numexpr"))
    except Exception as err:
        print(traceback.format_exc())

Number of cols: 31
Case python engine: 0     1564
1     1771
2     1557
3     1655
4     1605
      ... 
95    1608
96    1509
97    1424
98    1503
99    1443
Length: 100, dtype: int64
Case numexpr engine: 0     1564
1     1771
2     1557
3     1655
4     1605
      ... 
95    1608
96    1509
97    1424
98    1503
99    1443
Length: 100, dtype: int64
Number of cols: 32
Case python engine: 0     1378
1     1411
2     1485
3     1719
4     1559
      ... 
95    1586
96    1704
97    1842
98    1900
99    1509
Length: 100, dtype: int64
Traceback (most recent call last):
  File "<ipython-input-1-a87e1c7c8207>", line 20, in <module>
    print('Case numexpr engine:', df.eval(formula, engine="numexpr"))
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/frame.py", line 4937, in eval
    return _eval(expr, inplace=inplace, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/eval.py", line 357, in eval
    ret = eng_inst.evaluate()
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/engines.py", line 81, in evaluate
    res = self._evaluate()
          ^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/engines.py", line 121, in _evaluate
    return ne.evaluate(s, local_dict=scope)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/numexpr/necompiler.py", line 975, in evaluate
    return re_evaluate(local_dict=local_dict, global_dict=global_dict, _frame_depth=_frame_depth)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/numexpr/necompiler.py", line 1007, in re_evaluate
    return compiled_ex(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: too many inputs

Number of cols: 400
Case python engine: 0     19243
1     20062
2     20640
3     20546
4     19167
      ...  
95    20662
96    20529
97    20448
98    19507
99    19560
Length: 100, dtype: int64
Traceback (most recent call last):
  File "<ipython-input-1-a87e1c7c8207>", line 20, in <module>
    print('Case numexpr engine:', df.eval(formula, engine="numexpr"))
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/frame.py", line 4937, in eval
    return _eval(expr, inplace=inplace, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/eval.py", line 357, in eval
    ret = eng_inst.evaluate()
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/engines.py", line 81, in evaluate
    res = self._evaluate()
          ^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/engines.py", line 121, in _evaluate
    return ne.evaluate(s, local_dict=scope)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/numexpr/necompiler.py", line 977, in evaluate
    raise e
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/numexpr/necompiler.py", line 874, in validate
    _names_cache[expr_key] = getExprNames(ex, context, sanitize=sanitize)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/numexpr/necompiler.py", line 723, in getExprNames
    ex = stringToExpression(text, {}, context, sanitize)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/numexpr/necompiler.py", line 293, in stringToExpression
    c = compile(s, '<expr>', 'eval', flags)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<expr>", line 1
    (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((AB) + (AC)) + (AD)) + (AE)) + (AF)) + (AG)) + (AH)) + (AI)) + (AJ)) + (AK)) + (AL)) + (AM)) + (AN)) + (AO)) + (AP)) + (AQ)) + (AR)) + (AS)) + (AT)) + (AU)) + (AV)) + (AW)) + (AX)) + (AY)) + (AZ)) + (BA)) + (BC)) + (BD)) + (BE)) + (BF)) + (BG)) + (BH)) + (BI)) + (BJ)) + (BK)) + (BL)) + (BM)) + (BN)) + (BO)) + (BP)) + (BQ)) + (BR)) + (BS)) + (BT)) + (BU)) + (BV)) + (BW)) + (BX)) + (BY)) + (BZ)) + (CA)) + (CB)) + (CD)) + (CE)) + (CF)) + (CG)) + (CH)) + (CI)) + (CJ)) + (CK)) + (CL)) + (CM)) + (CN)) + (CO)) + (CP)) + (CQ)) + (CR)) + (CS)) + (CT)) + (CU)) + (CV)) + (CW)) + (CX)) + (CY)) + (CZ)) + (DA)) + (DB)) + (DC)) + (DE)) + (DF)) + (DG)) + (DH)) + (DI)) + (DJ)) + (DK)) + (DL)) + (DM)) + (DN)) + (DO)) + (DP)) + (DQ)) + (DR)) + (DS)) + (DT)) + (DU)) + (DV)) + (DW)) + (DX)) + (DY)) + (DZ)) + (EA)) + (EB)) + (EC)) + (ED)) + (EF)) + (EG)) + (EH)) + (EI)) + (EJ)) + (EK)) + (EL)) + (EM)) + (EN)) + (EO)) + (EP)) + (EQ)) + (ER)) + (ES)) + (ET)) + (EU)) + (EV)) + (EW)) + (EX)) + (EY)) + (EZ)) + (FA)) + (FB)) + (FC)) + (FD)) + (FE)) + (FG)) + (FH)) + (FI)) + (FJ)) + (FK)) + (FL)) + (FM)) + (FN)) + (FO)) + (FP)) + (FQ)) + (FR)) + (FS)) + (FT)) + (FU)) + (FV)) + (FW)) + (FX)) + (FY)) + (FZ)) + (GA)) + (GB)) + (GC)) + (GD)) + (GE)) + (GF)) + (GH)) + (GI)) + (GJ)) + (GK)) + (GL)) + (GM)) + (GN)) + (GO)) + (GP)) + (GQ)) + (GR)) + (GS)) + (GT)) + (GU)) + (GV)) + (GW)) + (GX)) + (GY)) + (GZ)) + (HA)) + (HB)) + (HC)) + (HD)) + (HE)) + (HF)) + (HG)) + (HI)) + (HJ)) + (HK)) + (HL)) + (HM)) + (HN)) + (HO)) + (HP)) + (HQ)) + (HR)) + (HS)) + (HT)) + (HU)) + (HV)) + (HW)) + (HX)) + (HY)) + (HZ)) + (IA)) + (IB)) + (IC)) + (ID)) + (IE)) + (IF)) + (IG)) + (IH)) + (IJ)) + (IK)) + (IL)) + (IM)) + (IN)) + (IO)) + (IP)) + (IQ)) + (IR)) + (IS)) + (IT)) + (IU)) + (IV)) + (IW)) + (IX)) + (IY)) + (IZ)) + (JA)) + (JB)) + (JC)) + (JD)) + (JE)) + (JF)) + (JG)) + (JH)) + (JI)) + (JK)) + (JL)) + (JM)) + (JN)) + (JO)) + (JP)) + (JQ)) + (JR)) + (JS)) + (JT)) + (JU)) + (JV)) + (JW)) + (JX)) + (JY)) + (JZ)) + (KA)) + (KB)) + (KC)) + (KD)) + (KE)) + (KF)) + (KG)) + (KH)) + (KI)) + (KJ)) + (KL)) + (KM)) + (KN)) + (KO)) + (KP)) + (KQ)) + (KR)) + (KS)) + (KT)) + (KU)) + (KV)) + (KW)) + (KX)) + (KY)) + (KZ)) + (LA)) + (LB)) + (LC)) + (LD)) + (LE)) + (LF)) + (LG)) + (LH)) + (LI)) + (LJ)) + (LK)) + (LM)) + (LN)) + (LO)) + (LP)) + (LQ)) + (LR)) + (LS)) + (LT)) + (LU)) + (LV)) + (LW)) + (LX)) + (LY)) + (LZ)) + (MA)) + (MB)) + (MC)) + (MD)) + (ME)) + (MF)) + (MG)) + (MH)) + (MI)) + (MJ)) + (MK)) + (ML)) + (MN)) + (MO)) + (MP)) + (MQ)) + (MR)) + (MS)) + (MT)) + (MU)) + (MV)) + (MW)) + (MX)) + (MY)) + (MZ)) + (NA)) + (NB)) + (NC)) + (ND)) + (NE)) + (NF)) + (NG)) + (NH)) + (NI)) + (NJ)) + (NK)) + (NL)) + (NM)) + (NO)) + (NP)) + (NQ)) + (NR)) + (NS)) + (NT)) + (NU)) + (NV)) + (NW)) + (NX)) + (NY)) + (NZ)) + (OA)) + (OB)) + (OC)) + (OD)) + (OE)) + (OF)) + (OG)) + (OH)) + (OI)) + (OJ)) + (OK)) + (OL)) + (OM)) + (ON)) + (OP)) + (OQ)) + (OR)) + (OS)) + (OT)) + (OU)) + (OV)) + (OW)) + (OX)) + (OY)) + (OZ)) + (PA)) + (PB)) + (PC)) + (PD)) + (PE)) + (PF)) + (PG)) + (PH)) + (PI)) + (PJ)) + (PK)) + (PL)) + (PM)) + (PN)) + (PO)) + (PQ)) + (PR)) + (PS)) + (PT)) + (PU)) + (PV)) + (PW)) + (PX)) + (PY)) + (PZ)
                                                                                                                                                                                                            ^
SyntaxError: too many nested parentheses

Number of cols: 650
Case python engine: 0     31058
1     32300
2     32281
3     31502
4     33150
      ...  
95    32228
96    32077
97    31442
98    32290
99    31135
Length: 100, dtype: int64
Traceback (most recent call last):
  File "<ipython-input-1-a87e1c7c8207>", line 20, in <module>
    print('Case numexpr engine:', df.eval(formula, engine="numexpr"))
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/frame.py", line 4937, in eval
    return _eval(expr, inplace=inplace, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/eval.py", line 357, in eval
    ret = eng_inst.evaluate()
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/engines.py", line 81, in evaluate
    res = self._evaluate()
          ^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/engines.py", line 116, in _evaluate
    s = self.convert()
        ^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/engines.py", line 63, in convert
    return printing.pprint_thing(self.expr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/io/formats/printing.py", line 233, in pprint_thing
    result = as_escaped_string(thing)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/io/formats/printing.py", line 209, in as_escaped_string
    result = str(thing)
             ^^^^^^^^^^
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++TRUNCATED+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    result = as_escaped_string(thing)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/io/formats/printing.py", line 209, in as_escaped_string
    result = str(thing)
             ^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/ops.py", line 229, in __repr__
    return pprint_thing(f" {self.op} ".join(parened))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<replaced>/.pyenv/versions/3.11.7/envs/<replaced>/lib/python3.11/site-packages/pandas/core/computation/ops.py", line 228, in <genexpr>
    parened = (f"({pprint_thing(opr)})" for opr in self.operands)
                   ^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded

The 32 limit is actually due to NumPy: numpy/numpy#4398

I think for NumPy 2.0 they will raise the limit to 64 (https://numpy.org/devdocs/reference/c-api/array.html#c.NPY_MAXARGS).