r-lib/nanoparquet

Writer: use smaller page sizes? (medium)

Closed this issue · 2 comments

At least we need to break columns into multiple pages if they are large.

DuckDB writes row groups with 122,880 rows.

The Parquet specs suggests row groups of 512MB-1GB. They also suggest a page size of 8KB, which seems way too low for me.

FWIW Arrow seems to write pages up 1M, so that should surely be fine.

Closed by #29.