Support DataTree for organizing Datasets by type of level
jthielen opened this issue · 4 comments
As discussed in xarray-contrib/datatree#195, it would be wonderful (and relatively straightforward) to add support for DataTree in cfgrib
. This would allow a improved organization of the different datasets that would have been previously been returned from cfgrib.open_datasets()
in a single data collection.
As far as implementation, I would propose refactoring the existing open_datasets()
to something like:
def open_datatree(path, backend_kwargs={}, **kwargs):
# type: (str, T.Dict[str, T.Any], T.Any) -> datatree.DataTree
"""
Open a GRIB file groupping incompatible hypercubes to different datasets via simple heuristics.
"""
squeeze = backend_kwargs.get("squeeze", True)
backend_kwargs = backend_kwargs.copy()
backend_kwargs["squeeze"] = False
datasets = open_variable_datasets(path, backend_kwargs=backend_kwargs, **kwargs)
type_of_level_datasets = {} # type: T.Dict[str, T.List[xr.Dataset]]
for ds in datasets:
for _, da in ds.data_vars.items():
type_of_level = da.attrs.get("GRIB_typeOfLevel", "undef")
type_of_level_datasets.setdefault(type_of_level, []).append(ds)
return datatree.DataTree.from_dict(type_of_level_datasets)
Then, open_datasets
could be re-implemented something like:
def open_datasets(path, backend_kwargs={}, **kwargs):
type_of_level_datasets = open_datatree(path, backend_kwargs=backend_kwargs, **kwargs)
merged = [] # type: T.List[xr.Dataset]
for type_of_level in sorted(type_of_level_datasets):
for ds in merge_datasets(type_of_level_datasets[type_of_level], join="exact"):
merged.append(ds.squeeze() if squeeze else ds)
return merged
(these snippets were edited quick in-between conference sessions; no guarantee that I didn't miss something and these don't work properly as-is)
This all being said, discussions would likely need to happen to decide whether this should be supported before or after integration of DataTree into xarray proper (xref pydata/xarray#7418).