Introduce "dataset iterator" as possible return type for data stores
Closed this issue · 0 comments
Is your feature request related to a problem? Please describe.
Large datasets are often processed slice-by-slice along a given dimension.
Most xcube data stores return a N-D data cube from store.open_data()
which must be programmatically subdivided into the desired slices. But many data stores already compose the returned cube from such smaller slices, e.g. NetCDF files for a given time stamp. For them, the natural and most efficient way would be to return these slices as an iterator of datasets.
Describe the solution you'd like
Introduce a new store data dsiter
that implements a Python interface DatasetIterator
:
class DatasetIterator(Iterator, Sized, ABC):
"""Interface representing a dataset interator."""
def __next__(self) -> xr.Dataset:
"""Yield the next dataset."""
and
DATASET_ITERATOR_TYPE = DataType(
DatasetIterator, ["dsiter", "xcube.core.store.DatasetIterator"]
)
DataType.register_data_type(DATASET_ITERATOR_TYPE)
Additional context
The concept has already successfully been implemented in the data store "smos"
provided by the xcube-smos plugin using the opener identifier "dsiter:smos:zarr"
.