ray-project/ray

[Data] `num_rows_per_file` parameter description is misleading

Closed this issue · 0 comments

Description

The parameter name num_rows_per_file suggests that the resulting files should have exactly the specified number of rows, but that isn't the case. For example, the resulting files might contain substantially more files.

Link

e.g., https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.write_parquet.html