pola-rs/polars

`pl.read_parquet()` and `pl.write_parquet()` for `pl.Decimal`

Closed this issue · 2 comments

Problem description

Following the progress on pl.Decimal has been very exciting:

  1. oldest thread on Decimal
  2. PR for physical i128 type
  3. Decimal design discussion
  4. PR for Decimal series

All of these issues have been merged/closed and it is now possible to create Decimal series.

I don't know whether parquet IO functions are the next logical step, but just wanted to create this issue to specifically flag the use of Decimal in parquet.

Minimal example of writing and reading parquet with Decimal:

import decimal
import polars as pl

# create dataframe
data = {
    'hi': [True, False, True, False],
    'bye': [1, 2, 3, decimal.Decimal(47283957238957239875)]
}
df = pl.DataFrame(data)
assert df['bye'].dtype == pl.Decimal

# write file
df.write_parquet('decimal_test.parquet')

# read file
df2 = df.read_parquet('decimal_test.parquet')

# test that pl.Decimal is dtype (this fails, column has dtype pl.Float64)
assert df2['bye'].dtype == pl.Decimal

# check that DataFrames are equal (this fails, equality comparison not implemented)
assert df.frame_equal(df2)

Running pqrs schema decimal_test.parquet reveals that the written parquet file uses column type DOUBLE

Did you run pl.Config.activate_decimals()?

activating decimals makes write_parquet work. thanks