Consider omitting unchunked dimensions from Key objects created with DatasetToChunks
shoyer opened this issue · 1 comments
Currently we have (from https://xarray-beam.readthedocs.io/en/latest/read-write.html):
with beam.Pipeline() as p:
p | xbeam.DatasetToChunks(ds, chunks={'time': 1000}) | beam.MapTuple(print_summary)
Key(offsets={'lat': 0, 'lon': 0, 'time': 0}, vars=None)
with <xarray.Dataset data_vars=['air'] dims={'lat': 25, 'time': 1000, 'lon': 53}>
Key(offsets={'lat': 0, 'lon': 0, 'time': 1000}, vars=None)
with <xarray.Dataset data_vars=['air'] dims={'lat': 25, 'time': 1000, 'lon': 53}>
Key(offsets={'lat': 0, 'lon': 0, 'time': 2000}, vars=None)
with <xarray.Dataset data_vars=['air'] dims={'lat': 25, 'time': 920, 'lon': 53}>
Should we instead omit lat
and lon
from these keys? This is less explicit but also more flexible, e.g,. if replacing these dimensions entirely with different dimensions, you don't need to update the keys.
One of my original motivations for this is obviated by #50, which now allows us to handle variables in DatasetToChunks even if they don't include "chunked" dimensions.
It's still an open question whether this change would make Xarray-Beam more usable or not.
If we do not make this change, potentially we could enforce the invariant that key.offsets.keys() == dataset.dims.keys()
. This might be convenient for writing new transforms.