pola-rs/polars

panic when writing Decimal to Parquet

Closed this issue · 3 comments

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import os
os.environ['POLARS_VERBOSE'] = '1'

import pyarrow as pa
import polars as pl
pl.Config.activate_decimals()

df = pa.Table.from_arrays(
    [pa.array([1, 12, 17, 23, 28], type=pa.decimal128(38,9))],
    names=['nums']
)

df1 = pl.from_arrow(df)
#df2 = pl.DataFrame([pl.Series('nums', [1, 12, 17, 23, 28], dtype=pl.Decimal(9, 38))])

df1.write_parquet("test.parquet")

Log output

thread '<unnamed>' panicked at /home/runner/work/polars/polars/crates/polars-arrow/src/compute/aggregate/memory.rs:45:33:
operator does not support primitive `Int128`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
/tmp/ipykernel_2018/362971460.py in <module>
     14 #df2 = pl.DataFrame([pl.Series('nums', [1, 12, 17, 23, 28], dtype=pl.Decimal(9, 38))])
     15 
---> 16 df1.write_parquet("test.parquet")

/opt/conda/lib/python3.9/site-packages/polars/dataframe/frame.py in write_parquet(self, file, compression, compression_level, statistics, row_group_size, use_pyarrow, pyarrow_options)
   3395 
   3396         else:
-> 3397             self._df.write_parquet(
   3398                 file, compression, compression_level, statistics, row_group_size
   3399             )

PanicException: operator does not support primitive `Int128`

Issue description

Polars crashes when writing Decimal128 to a Parquet file.

This is Polars 0.19.12, the newest at the time I'm writing this. This might be a regression of #8191 which describes this functionality as working.

Expected behavior

I expect the code to write a valid Parquet file using the Decimal128 type.

Installed versions

--------Version info---------
Polars:              0.19.12
Index type:          UInt32
Platform:            Linux-4.14.326-245.539.amzn2.x86_64-x86_64-with-glibc2.31
Python:              3.9.13 (main, Aug 25 2022, 23:26:10) 
[GCC 11.2.0]

----Optional dependencies----
adbc_driver_sqlite:  <not installed>
cloudpickle:         2.0.0
connectorx:          <not installed>
deltalake:           <not installed>
fsspec:              2022.7.1
gevent:              <not installed>
matplotlib:          3.5.2
numpy:               1.21.5
openpyxl:            3.0.10
pandas:              1.3.2
pyarrow:             13.0.0
pydantic:            <not installed>
pyiceberg:           <not installed>
pyxlsb:              <not installed>
sqlalchemy:          1.4.39
xlsx2csv:            <not installed>
xlsxwriter:          3.0.3

Sorry, meant to add: creating the DataTable directly in Polars (df2 in the code above) leads to the same error.

I can confirm that this code used to work:

import decimal
import polars as pl
pl.Config.activate_decimals()

# create dataframe
data = {
    'hi': [True, False, True, False],
    'bye': [1, 2, 3, decimal.Decimal(47283957238957239875)]
}
df = pl.DataFrame(data)
assert df['bye'].dtype == pl.Decimal

# write file
df.write_parquet('decimal_test.parquet')

But using 0.19.13 it throws: pyo3_runtime.PanicException: operator does not support primitive Int128

jkc1 commented

I ran into what appears to be a related issue. I ran a git bisect that identified polars==0.19.9 as the version that introduced the breakage. I also observed that the error produced differs in 0.19.9 and 0.19.13. In 0.19.13, the error is the one reported above, but in 0.19.9 it was called Option::unwrap() on a 'None' value. Not sure if that last bit of additional information is helpful.