google/xarray-beam

Require using make_template() if providing a template to ChunksToZarr?

shoyer opened this issue · 2 comments

Currently we support passing an xarray.Dataset full of chunked dask.array objects as template into ChunksToZarr.

This is convenient in simple cases, but makes it easy to write pipelines that are super slow to setup, if you pass in a chunked Dataset with many small chunks (e.g., the default output of xarray.open_zarr()).

The breaking change here would be to require that the template argument was created via make_template(), by checking that each dask.array argument in the supplied Dataset only consists of a single chunk. We would also make zarr_chunks required when supplying a template, because it makes no sense to copy chunks from a template if using make_template.

As an alternative, we could instead perhaps use make_template() internally inside ChunksToZarr.

As an alternative, we could instead perhaps use make_template() internally inside ChunksToZarr.

I implemented this in #62