`pl.read_parquet()` and `pl.write_parquet()` for `pl.Decimal`
Closed this issue · 2 comments
sslivkoff commented
Problem description
Following the progress on pl.Decimal
has been very exciting:
All of these issues have been merged/closed and it is now possible to create Decimal
series.
I don't know whether parquet IO functions are the next logical step, but just wanted to create this issue to specifically flag the use of Decimal
in parquet.
Minimal example of writing and reading parquet with Decimal
:
import decimal
import polars as pl
# create dataframe
data = {
'hi': [True, False, True, False],
'bye': [1, 2, 3, decimal.Decimal(47283957238957239875)]
}
df = pl.DataFrame(data)
assert df['bye'].dtype == pl.Decimal
# write file
df.write_parquet('decimal_test.parquet')
# read file
df2 = df.read_parquet('decimal_test.parquet')
# test that pl.Decimal is dtype (this fails, column has dtype pl.Float64)
assert df2['bye'].dtype == pl.Decimal
# check that DataFrames are equal (this fails, equality comparison not implemented)
assert df.frame_equal(df2)
Running pqrs schema decimal_test.parquet
reveals that the written parquet file uses column type DOUBLE
ritchie46 commented
Did you run pl.Config.activate_decimals()
?
sslivkoff commented
activating decimals makes write_parquet
work. thanks