r-lib/nanoparquet

Feature request "BufferedOutputStream()"

Closed this issue · 3 comments

Hi,

I believe this library is very interesting for replacing the use of Arrow in different libraries that perform reading, writing, and serialization of data in Parquet format. However, I think it would be interesting to have an implementation of BufferOutputStream() to avoid the disk write in cases where the goal is to obtain the raw data of the Parquet file.

Regards,

Do you want to mean that you want the output in a memory buffer, in a raw vector? Or you actually want to stream the output to HTTP?

Hi @gaborcsardi,

Thanks for your prompt response,

That's correct, I want the output in a memory buffer, as for example is done with an arrow:

export_parquet <- function(values) {

  check_installed(arrow, "for source_format = `PARQUET`")

  con <- arrow::BufferOutputStream$create()
  defer(con$close())
  arrow::write_parquet(values, con)

  as.raw(arrow::buffer(con))

}

Regards,

Now you can do write_parquet(..., ":raw:") to write to a memory buffer, and write_parquet() will return the raw vector of the Parquet file:

pq <- nanoparquet::write_parquet(mtcars, ":raw:")