Introduce "dataset iterator" as possible return type for data stores

Question

Introduce "dataset iterator" as possible return type for data stores

Closed this issue 4 months ago · 0 comments

Is your feature request related to a problem? Please describe.

Large datasets are often processed slice-by-slice along a given dimension.
Most xcube data stores return a N-D data cube from store.open_data() which must be programmatically subdivided into the desired slices. But many data stores already compose the returned cube from such smaller slices, e.g. NetCDF files for a given time stamp. For them, the natural and most efficient way would be to return these slices as an iterator of datasets.

Describe the solution you'd like

Introduce a new store data dsiter that implements a Python interface DatasetIterator:

class DatasetIterator(Iterator, Sized, ABC):
    """Interface representing a dataset interator."""

    def __next__(self) -> xr.Dataset:
        """Yield the next dataset."""

and

DATASET_ITERATOR_TYPE = DataType(
    DatasetIterator, ["dsiter", "xcube.core.store.DatasetIterator"]
)
DataType.register_data_type(DATASET_ITERATOR_TYPE)

Additional context

The concept has already successfully been implemented in the data store "smos" provided by the xcube-smos plugin using the opener identifier "dsiter:smos:zarr".