panic when writing Decimal to Parquet
Closed this issue · 3 comments
Checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import os
os.environ['POLARS_VERBOSE'] = '1'
import pyarrow as pa
import polars as pl
pl.Config.activate_decimals()
df = pa.Table.from_arrays(
[pa.array([1, 12, 17, 23, 28], type=pa.decimal128(38,9))],
names=['nums']
)
df1 = pl.from_arrow(df)
#df2 = pl.DataFrame([pl.Series('nums', [1, 12, 17, 23, 28], dtype=pl.Decimal(9, 38))])
df1.write_parquet("test.parquet")
Log output
thread '<unnamed>' panicked at /home/runner/work/polars/polars/crates/polars-arrow/src/compute/aggregate/memory.rs:45:33:
operator does not support primitive `Int128`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
PanicException Traceback (most recent call last)
/tmp/ipykernel_2018/362971460.py in <module>
14 #df2 = pl.DataFrame([pl.Series('nums', [1, 12, 17, 23, 28], dtype=pl.Decimal(9, 38))])
15
---> 16 df1.write_parquet("test.parquet")
/opt/conda/lib/python3.9/site-packages/polars/dataframe/frame.py in write_parquet(self, file, compression, compression_level, statistics, row_group_size, use_pyarrow, pyarrow_options)
3395
3396 else:
-> 3397 self._df.write_parquet(
3398 file, compression, compression_level, statistics, row_group_size
3399 )
PanicException: operator does not support primitive `Int128`
Issue description
Polars crashes when writing Decimal128 to a Parquet file.
This is Polars 0.19.12, the newest at the time I'm writing this. This might be a regression of #8191 which describes this functionality as working.
Expected behavior
I expect the code to write a valid Parquet file using the Decimal128 type.
Installed versions
--------Version info---------
Polars: 0.19.12
Index type: UInt32
Platform: Linux-4.14.326-245.539.amzn2.x86_64-x86_64-with-glibc2.31
Python: 3.9.13 (main, Aug 25 2022, 23:26:10)
[GCC 11.2.0]
----Optional dependencies----
adbc_driver_sqlite: <not installed>
cloudpickle: 2.0.0
connectorx: <not installed>
deltalake: <not installed>
fsspec: 2022.7.1
gevent: <not installed>
matplotlib: 3.5.2
numpy: 1.21.5
openpyxl: 3.0.10
pandas: 1.3.2
pyarrow: 13.0.0
pydantic: <not installed>
pyiceberg: <not installed>
pyxlsb: <not installed>
sqlalchemy: 1.4.39
xlsx2csv: <not installed>
xlsxwriter: 3.0.3
Sorry, meant to add: creating the DataTable directly in Polars (df2
in the code above) leads to the same error.
I can confirm that this code used to work:
import decimal
import polars as pl
pl.Config.activate_decimals()
# create dataframe
data = {
'hi': [True, False, True, False],
'bye': [1, 2, 3, decimal.Decimal(47283957238957239875)]
}
df = pl.DataFrame(data)
assert df['bye'].dtype == pl.Decimal
# write file
df.write_parquet('decimal_test.parquet')
But using 0.19.13
it throws: pyo3_runtime.PanicException: operator does not support primitive Int128
I ran into what appears to be a related issue. I ran a git bisect
that identified polars==0.19.9
as the version that introduced the breakage. I also observed that the error produced differs in 0.19.9 and 0.19.13. In 0.19.13, the error is the one reported above, but in 0.19.9 it was called Option::unwrap() on a 'None' value
. Not sure if that last bit of additional information is helpful.