Decimal types in parquet files cannot be converted to ruby
Closed this issue · 1 comments
simbasdad commented
I have a parquet file that has a decimal type in it:
stringio = StringIO.new(File.binread("example.parquet"))
df = T.let(Polars.read_parquet(stringio), Polars::DataFrame)
puts df[["revenue"]]
shape: (438, 1)
┌────────────────┐
│ revenue │
│ --- │
│ decimal[.20,3] │
╞════════════════╡
│ 409.59 │
│ 72 │
│ 584.34 │
│ 5 │
│ … │
│ 241.71 │
│ 15.11 │
│ 78.16 │
│ 147 │
└────────────────┘
When I try to convert the dataframe to a hash, it fails:
df.to_hashes
thread '<unnamed>' panicked at 'not yet implemented', ext/polars/src/conversion.rs:209:46
stack backtrace:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::panicking::panic
3: <polars::conversion::Wrap<polars_core::datatypes::any_value::AnyValue> as magnus::into_value::IntoValue>::into_value_with
4: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
5: <magnus::r_array::RArray as core::iter::traits::collect::FromIterator<T>>::from_iter
6: polars::dataframe::RbDataFrame::row_tuple
7: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
8: polars::init::anon
9: _vm_call_cfunc_with_frame
10: _vm_sendish
11: _vm_exec_core
12: _rb_vm_exec
13: _invoke_block_from_c_bh
14: _rb_yield_values2
15: _collect_i
16: _invoke_block_from_c_bh
17: _rb_yield_1
18: _int_dotimes
19: _vm_call0_body
20: _rb_call0
21: _rb_iterate0
22: _rb_block_call_kw
23: _vm_call0_body
24: _rb_call0
25: _rb_iterate0
26: _rb_lambda_call
27: _enum_collect
28: _vm_call_cfunc_with_frame
29: _vm_sendish
30: _vm_exec_core
31: _rb_vm_exec
32: _rb_ec_exec_node
33: _ruby_run_node
34: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
~/.gem/ruby/3.2.1/gems/polars-df-0.5.0-arm64-darwin/lib/polars/data_frame.rb:763:in `row_tuple': not yet implemented (fatal)
from ~/.gem/ruby/3.2.1/gems/polars-df-0.5.0-arm64-darwin/lib/polars/data_frame.rb:763:in `block in to_hashes'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.5.0-arm64-darwin/lib/polars/data_frame.rb:762:in `times'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.5.0-arm64-darwin/lib/polars/data_frame.rb:762:in `each'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.5.0-arm64-darwin/lib/polars/data_frame.rb:762:in `map'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.5.0-arm64-darwin/lib/polars/data_frame.rb:762:in `to_hashes'
from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:9:in `<main>'
It appears that this can be worked around by casting:
df = df.select(
[
Polars.col("revenue").cast(:f64),
],
)
This will work, but is inconvenient as the conversion is happening in a base class that processes many files and is not schema aware.
ankane commented
Thanks @simbasdad! Improved support for the Decimal
type in the commit above.
Note: Casting to f64
will lose precision.