Support Lists of Structs
Closed this issue · 7 comments
I am attempting to migrate to polars-ruby
from the red-parquet
gem. Almost everything went smoothly, but I'm running into issues processing a parquet file that contains lists of structs.
The file loads fine:
stringio = StringIO.new(File.binread("example.snappy.parquet"))
df = Polars.read_parquet(stringio)
puts df
shape: (1_281, 3)
┌─────────────┬───────────────────────────────────┬────────────────────────────┐
│ id ┆ complex_column ┆ last_processed_at │
│ --- ┆ --- ┆ --- │
│ i64 ┆ list[struct[7]] ┆ datetime[μs] │
╞═════════════╪═══════════════════════════════════╪════════════════════════════╡
│ 29 ┆ [{"off",255942,"pumbaa","pumbaa"… ┆ 2022-07-25 15:14:04.554156 │
│ 2509 ┆ [{"off",30373,"timon","timon","a… ┆ 2022-07-25 15:14:04.554156 │
│ 7225 ┆ [{"off",468307,"simba","simba","… ┆ 2022-07-25 15:14:04.554156 │
│ 13518 ┆ [{"off",381746,"rafiki","rafiki"… ┆ 2022-07-25 15:14:04.554156 │
│ … ┆ … ┆ … │
│ 8419 ┆ [{"off",245853,"sebastian","seba… ┆ 2022-07-25 15:14:04.554156 │
│ 11536 ┆ [{"off",317070,"flounder","floun… ┆ 2022-07-25 15:14:04.554156 │
│ 14174 ┆ [{"off",447375,"ariel","ariel","… ┆ 2022-07-25 15:14:04.554156 │
│ 14850 ┆ [{"off",435707,"triton","triton"… ┆ 2022-07-25 15:14:04.554156 │
└─────────────┴───────────────────────────────────┴────────────────────────────┘
Things go bad when I attempt to use it:
df.rows(named: true).each {}
thread '<unnamed>' panicked at 'not yet implemented', ext/polars/src/series.rs:544:22
stack backtrace:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::panicking::panic
3: polars::series::RbSeries::to_a
4: <polars::conversion::Wrap<polars_core::datatypes::any_value::AnyValue> as magnus::into_value::IntoValue>::into_value_with
5: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
6: <magnus::r_array::RArray as core::iter::traits::collect::FromIterator<T>>::from_iter
7: <magnus::r_array::RArray as core::iter::traits::collect::FromIterator<T>>::from_iter
8: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
9: polars::init::anon
10: _vm_call_cfunc_with_frame
11: _vm_sendish
12: _vm_exec_core
13: _rb_vm_exec
14: _rb_ec_exec_node
15: _ruby_run_node
16: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4423:in `row_tuples': not yet implemented (fatal)
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4423:in `rows'
from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:13:in `<main>'
or
col = df.get_column("complex_column")
col.each {}
thread '<unnamed>' panicked at 'not yet implemented', ext/polars/src/series.rs:544:22
stack backtrace:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::panicking::panic
3: polars::series::RbSeries::to_a
4: <polars::conversion::Wrap<polars_core::datatypes::any_value::AnyValue> as magnus::into_value::IntoValue>::into_value_with
5: polars::series::RbSeries::get_idx
6: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
7: polars::init::anon
8: _vm_call_cfunc_with_frame
9: _vm_sendish
10: _vm_exec_core
11: _rb_vm_exec
12: _invoke_block_from_c_bh
13: _rb_yield_1
14: _int_dotimes
15: _vm_call_cfunc_with_frame
16: _vm_sendish
17: _vm_exec_core
18: _rb_vm_exec
19: _rb_ec_exec_node
20: _ruby_run_node
21: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:282:in `get_idx': not yet implemented (fatal)
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:282:in `[]'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:269:in `block in each'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:268:in `times'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:268:in `each'
from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:11:in `<main>'
Prior to discovering that this happened when working with a parquet file, I had attempted to create a DataFrame
in a unit test. Unfortunately, I wasn't able to create the DataFrame
either:
df = Polars::DataFrame.new(
{
a: [[{}], [{}], [{}]],
},
)
~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/utils.rb:129:in `rescue in rb_type_to_dtype': Conversion of Ruby data type Hash to Polars data type not implemented. (ArgumentError)
raise ArgumentError, "Conversion of Ruby data type #{data_type} to Polars data type not implemented."
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/utils.rb:126:in `rb_type_to_dtype'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:3779:in `sequence_to_rbseries'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:69:in `initialize'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `new'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `read_hash'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `hash_to_rbdf'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:35:in `initialize'
from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:26:in `new'
from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:26:in `<main>'
~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/utils.rb:127:in `fetch': key not found: Hash (KeyError)
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/utils.rb:127:in `rb_type_to_dtype'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:3779:in `sequence_to_rbseries'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:69:in `initialize'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `new'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `read_hash'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `hash_to_rbdf'
from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:35:in `initialize'
from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:26:in `new'
from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:26:in `<main>'
Hi @simbasdad, thanks for reporting! Added support for converting lists and structs to Ruby in the commit above. Will take a look at the data frame constructor when I have a chance.
@ankane Just wanted to say thank you for the quick turn around. Working with this gem has been a great experience.
We were running into the same problem, and pointing to the master
branch fixes it for us. Thank you for the fix @ankane! Any idea when the next version will be released?
Fixed the constructor in the commit above.
@sambostock There are more changes I'd like to make before 0.5.0, so don't have a timeline.
Got through everything, so just pushed a new release.
@ankane Thanks so much! I was able to successfully switch from red-parquet
to polars-df
.
I did run into two usability issues that I managed to work around. I was hoping to be able to contribute, but I'm really unsure where to begin/am not sure if the changes are actually desired.
Since I don't want to seem ungrateful, I just wanted to check that you're ok with me creating some low priority issues for your consideration.
Feel free to create issues for feature requests.