`pl.read_parquet()` and `pl.write_parquet()` for `pl.Decimal`

Question

`pl.read_parquet()` and `pl.write_parquet()` for `pl.Decimal`

Closed this issue a year ago · 2 comments

Problem description

Following the progress on pl.Decimal has been very exciting:

All of these issues have been merged/closed and it is now possible to create Decimal series.

I don't know whether parquet IO functions are the next logical step, but just wanted to create this issue to specifically flag the use of Decimal in parquet.

Minimal example of writing and reading parquet with Decimal:

import decimal
import polars as pl

# create dataframe
data = {
    'hi': [True, False, True, False],
    'bye': [1, 2, 3, decimal.Decimal(47283957238957239875)]
}
df = pl.DataFrame(data)
assert df['bye'].dtype == pl.Decimal

# write file
df.write_parquet('decimal_test.parquet')

# read file
df2 = df.read_parquet('decimal_test.parquet')

# test that pl.Decimal is dtype (this fails, column has dtype pl.Float64)
assert df2['bye'].dtype == pl.Decimal

# check that DataFrames are equal (this fails, equality comparison not implemented)
assert df.frame_equal(df2)

Running pqrs schema decimal_test.parquet reveals that the written parquet file uses column type DOUBLE

Answer 1 · 2023-04-13T08:00:42.000Z

Did you run pl.Config.activate_decimals()?

Answer 2 · 2023-04-15T23:21:08.000Z

activating decimals makes write_parquet work. thanks