Support for striding / rolling windows
mjwillson opened this issue · 1 comments
Nice library, thanks! Will it work to pass e.g. dataset.rolling(...)
or dataset.rolling(...).construct(...)
to xarray_beam.DatasetToChunks
? (I'm assuming not, or that it would result in something hugely inefficient?) If not, would it be possible to support striding / sliding windows in DatasetToChunks
?
DatasetToChunks
is a rather sharp tool at the moment: you can pass in Dask backed xarray.Dataset
objects and it often works, but you have to be careful (1) to ensure that the dask graph is not too large to serialize and send to each worker, and (2) that it is safe to compute individual chunks from the dataset on separate workers (without running out of memory or doing too much duplicate work).
So yes, you probably could get away with passing the result of dataset.rolling(...).construct(...)
into DatasetToChunks
but you would need to take some care.
Ultimately I think it would be nice to have Beam PTransforms for rolling window computation, similar to dask.array.map_overlap:
https://docs.dask.org/en/latest/array-overlap.html
I don't have any immediate plans to work on this but it would definitely be within scope for the project.