r-lib/nanoparquet

[feature request] read parquet from URL (or from raw vector?)

tanho63 opened this issue · 5 comments

Hi! Excited by the looks of this package. A frequent use case I have is reading a parquet from a URL, e.g.

arrow::read_parquet("https://github.com/nflverse/nflverse-data/releases/download/pbp/play_by_play_2023.parquet")

Is this something that would be in-scope for nanoparquet?

Yes, we could definitely do one or both of those. The challenge for the HTTP is to keep the package lean, but reading from a raw vector is pretty straightforward. write_parquet() already supports writing to a raw vector.

Btw. we could also support reading from an R connection, then you could do

read_parquet(url("https://...."))

either of these would be great!

Reading from a connection would be great as that's how we read rds files from url!