ankane/ruby-polars

Segfault after GC when data frame contains ruby objects

q3aiml opened this issue · 2 comments

Thanks for bringing polars to ruby!

I can get Ruby to reliably crash when putting objects into a dataframe and performing garbage collection:

irb(main):004> Polars::VERSION
=> "0.9.0"
irb(main):005> df = Polars::DataFrame.new({ c: [Object.new, Object.new, Object.new] },  schema: {'c' => Polars::Object})
=> 
shape: (3, 1)
...
irb(main):006> df
=> 
shape: (3, 1)
┌──────────────────────────────┐
│ c                            │
│ ---                          │
│ object                       │
╞══════════════════════════════╡
│ #<Object:0x0000000126aa8008> │
│ #<Object:0x0000000126aa7fb8> │
│ #<Object:0x0000000126aa7f68> │
└──────────────────────────────┘
irb(main):007> GC.start
=> nil
irb(main):008> df
[ segfault ]

In more natural use cases I have also seen this appear first as corruption, with arbitrary other objects replacing those in the data frame.

Ruby version: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]

I don't have tons of experience here and haven't had a chance to dig in, but I imagine the objects need to be marked as living outside ruby with something like rb_gc_mark?

Hi @q3aiml, thanks for reporting! Am able to reproduce, but will need some time to figure out how to resolve.