Polars::Binary type
Tseyang opened this issue · 5 comments
Thanks for making this library! I'm playing around with it and trying to read a Parquet file that has string that gets encoded as binary data. It seems like the DataFrame
that gets created has the column with [binary data]
:
[17] pry(main)> df[["content"]]
=> shape: (1035, 1)
┌───────────────┐
│ content │
│ --- │
│ binary │
╞═══════════════╡
│ [binary data] │
│ [binary data] │
│ [binary data] │
│ [binary data] │
│ ... │
│ [binary data] │
│ [binary data] │
│ [binary data] │
│ [binary data] │
└───────────────┘
I can't seem to find a way to transform this data into its string representation and everything I've tried (indexing into the Series
, to_a.map
, apply
) seems to indicate that this data type is still not properly supported yet:
[18] pry(main)> df["content"][0]
thread '<unnamed>' panicked at 'not yet implemented', ext/polars/src/conversion.rs:164:37
fatal: not yet implemented
from /Users/tyl/.gem/ruby/3.2.1/gems/polars-df-0.3.1-arm64-darwin/lib/polars/series.rb:282:in `get_idx'
I'm wondering if there's an existing way to achieve what I want - to transform a Series
holding Polars::Binary
data into Polars::Utf8
?
Thank you!
Also, df["content"].to_a
and df["content"][0]
will work in the next release.
ah I see. Thanks for the fast turnaround!
@ankane sorry I had another small question: I don't suppose read_parquet
takes in Bytes
directly vs. a filename? E.g. if I have a means of obtaining the raw bytes for the parquet data, can I directly create a DataFrame
from it without writing to a Tempfile
?
It didn't seem so from the function declaration but maybe I overlooked something.
You can pass an object that responds to read
, like:
require "stringio"
io = StringIO.new("binary-data")
df = Polars.read_parquet(io)