ankane/ruby-polars

Support Lists of Structs

Closed this issue · 7 comments

I am attempting to migrate to polars-ruby from the red-parquet gem. Almost everything went smoothly, but I'm running into issues processing a parquet file that contains lists of structs.

The file loads fine:

stringio = StringIO.new(File.binread("example.snappy.parquet"))
df = Polars.read_parquet(stringio)
puts df
shape: (1_281, 3)
┌─────────────┬───────────────────────────────────┬────────────────────────────┐
│ id          ┆ complex_column                    ┆ last_processed_at          │
│ ---         ┆ ---                               ┆ ---                        │
│ i64         ┆ list[struct[7]]                   ┆ datetime[μs]               │
╞═════════════╪═══════════════════════════════════╪════════════════════════════╡
│ 29          ┆ [{"off",255942,"pumbaa","pumbaa"… ┆ 2022-07-25 15:14:04.554156 │
│ 2509        ┆ [{"off",30373,"timon","timon","a… ┆ 2022-07-25 15:14:04.554156 │
│ 7225        ┆ [{"off",468307,"simba","simba","… ┆ 2022-07-25 15:14:04.554156 │
│ 13518       ┆ [{"off",381746,"rafiki","rafiki"… ┆ 2022-07-25 15:14:04.554156 │
│ …           ┆ …                                 ┆ …                          │
│ 8419        ┆ [{"off",245853,"sebastian","seba… ┆ 2022-07-25 15:14:04.554156 │
│ 11536       ┆ [{"off",317070,"flounder","floun… ┆ 2022-07-25 15:14:04.554156 │
│ 14174       ┆ [{"off",447375,"ariel","ariel","… ┆ 2022-07-25 15:14:04.554156 │
│ 14850       ┆ [{"off",435707,"triton","triton"… ┆ 2022-07-25 15:14:04.554156 │
└─────────────┴───────────────────────────────────┴────────────────────────────┘

Things go bad when I attempt to use it:

df.rows(named: true).each {}
thread '<unnamed>' panicked at 'not yet implemented', ext/polars/src/series.rs:544:22
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic
   3: polars::series::RbSeries::to_a
   4: <polars::conversion::Wrap<polars_core::datatypes::any_value::AnyValue> as magnus::into_value::IntoValue>::into_value_with
   5: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
   6: <magnus::r_array::RArray as core::iter::traits::collect::FromIterator<T>>::from_iter
   7: <magnus::r_array::RArray as core::iter::traits::collect::FromIterator<T>>::from_iter
   8: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
   9: polars::init::anon
  10: _vm_call_cfunc_with_frame
  11: _vm_sendish
  12: _vm_exec_core
  13: _rb_vm_exec
  14: _rb_ec_exec_node
  15: _ruby_run_node
  16: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4423:in `row_tuples': not yet implemented (fatal)
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4423:in `rows'
	from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:13:in `<main>'

or

col = df.get_column("complex_column")
col.each {}
thread '<unnamed>' panicked at 'not yet implemented', ext/polars/src/series.rs:544:22
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic
   3: polars::series::RbSeries::to_a
   4: <polars::conversion::Wrap<polars_core::datatypes::any_value::AnyValue> as magnus::into_value::IntoValue>::into_value_with
   5: polars::series::RbSeries::get_idx
   6: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
   7: polars::init::anon
   8: _vm_call_cfunc_with_frame
   9: _vm_sendish
  10: _vm_exec_core
  11: _rb_vm_exec
  12: _invoke_block_from_c_bh
  13: _rb_yield_1
  14: _int_dotimes
  15: _vm_call_cfunc_with_frame
  16: _vm_sendish
  17: _vm_exec_core
  18: _rb_vm_exec
  19: _rb_ec_exec_node
  20: _ruby_run_node
  21: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:282:in `get_idx': not yet implemented (fatal)
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:282:in `[]'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:269:in `block in each'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:268:in `times'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:268:in `each'
	from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:11:in `<main>'

Prior to discovering that this happened when working with a parquet file, I had attempted to create a DataFrame in a unit test. Unfortunately, I wasn't able to create the DataFrame either:

df = Polars::DataFrame.new(
  {
    a: [[{}], [{}], [{}]],
  },
)
~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/utils.rb:129:in `rescue in rb_type_to_dtype': Conversion of Ruby data type Hash to Polars data type not implemented. (ArgumentError)

        raise ArgumentError, "Conversion of Ruby data type #{data_type} to Polars data type not implemented."
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/utils.rb:126:in `rb_type_to_dtype'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:3779:in `sequence_to_rbseries'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:69:in `initialize'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `new'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `read_hash'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `hash_to_rbdf'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:35:in `initialize'
	from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:26:in `new'
	from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:26:in `<main>'
~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/utils.rb:127:in `fetch': key not found: Hash (KeyError)
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/utils.rb:127:in `rb_type_to_dtype'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:3779:in `sequence_to_rbseries'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/series.rb:69:in `initialize'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `new'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `read_hash'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:4758:in `hash_to_rbdf'
	from ~/.gem/ruby/3.2.1/gems/polars-df-0.4.0-arm64-darwin/lib/polars/data_frame.rb:35:in `initialize'
	from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:26:in `new'
	from ~/Library/Application Support/JetBrains/RubyMine2023.1/scratches/polars.rb:26:in `<main>'
ankane commented

Hi @simbasdad, thanks for reporting! Added support for converting lists and structs to Ruby in the commit above. Will take a look at the data frame constructor when I have a chance.

@ankane Just wanted to say thank you for the quick turn around. Working with this gem has been a great experience.

We were running into the same problem, and pointing to the master branch fixes it for us. Thank you for the fix @ankane! Any idea when the next version will be released?

ankane commented

Fixed the constructor in the commit above.

@sambostock There are more changes I'd like to make before 0.5.0, so don't have a timeline.

ankane commented

Got through everything, so just pushed a new release.

@ankane Thanks so much! I was able to successfully switch from red-parquet to polars-df.

I did run into two usability issues that I managed to work around. I was hoping to be able to contribute, but I'm really unsure where to begin/am not sure if the changes are actually desired.

Since I don't want to seem ungrateful, I just wanted to check that you're ok with me creating some low priority issues for your consideration.

ankane commented

Feel free to create issues for feature requests.