[feature request] read parquet from URL (or from raw vector?)
tanho63 opened this issue · 5 comments
tanho63 commented
Hi! Excited by the looks of this package. A frequent use case I have is reading a parquet from a URL, e.g.
arrow::read_parquet("https://github.com/nflverse/nflverse-data/releases/download/pbp/play_by_play_2023.parquet")
Is this something that would be in-scope for nanoparquet?
gaborcsardi commented
Yes, we could definitely do one or both of those. The challenge for the HTTP is to keep the package lean, but reading from a raw vector is pretty straightforward. write_parquet()
already supports writing to a raw vector.
Btw. we could also support reading from an R connection, then you could do
read_parquet(url("https://...."))
tanho63 commented
either of these would be great!
mrcaseb commented
Reading from a connection would be great as that's how we read rds files from url!