Inconsistent `target_chunks` api behavior between zarr group and xarray dataset
rabernat opened this issue · 0 comments
The docs say the following about the target_chunks
argument when rechunking a group:
For a group of arrays, a dict is required. The keys correspond to array names. The values are target_chunks arguments for the array. For example,
{'foo': (20, 10), 'bar': {'x': 3, 'y': 5}, 'baz': None}
. All arrays you want to rechunk must be explicitly named. Arrays that are not present in the target_chunks dict will be ignored.
Xarray datasets are very similar to Zarr groups. However, the behavior is a bit different with Xarray datasets. This difference is documented in the tests, but not the docs. Here is the target_chunks
parameter for test_rechunk_dataset
rechunker/tests/test_rechunk.py
Lines 49 to 50 in f55a475
Note that the variable
c
is not present. However, it is present in the output dataset:rechunker/tests/test_rechunk.py
Lines 123 to 125 in f55a475
The original chunks have been preserved, a reasonable default.
We should strive to reconcile, or at least document, this difference. My personal preference would be to change the API so that at flat zarr group behaves the same as the xarray dataset: variables that are not mentioned in target_chunks
simply get passed through with identical chunks.
cc @eric-czech who wrote the test_rechunk_dataset
so probably understands this part of the code the best.