Inconsistent use of xarray's open methods
Opened this issue · 2 comments
What happened?
Some backends use xr.open_dataset
whereas others use xr.open_mfdataset
.
Because of that, our code does not work seamlessly with all datasets.
Asxr.open_mfdataset
is more general and implements more functionalities, would it be possible to use it everywhere?
There's also another important downside. The behaviour of xr.open_dataset
and xr.open_mfdataset
is not identical with single files. For example, xr.open_mfdataset
uses dask by default whereas xr.open_dataset
does not (you'd have to explicitly pass the argument chunks={}
).
What are the steps to reproduce the bug?
import earthkit.data
collection_id = "reanalysis-era5-single-levels"
request = {
"variable": "2t",
"product_type": "reanalysis",
"date": "2012-12-01",
"time": "12:00",
}
kwargs = {"preprocess": lambda ds: ds**2}
nc = earthkit.data.from_source("cds", collection_id, **request, format="netcdf")
nc.to_xarray(xarray_open_mfdataset_kwargs=kwargs) # OK
grib = earthkit.data.from_source("cds", collection_id, **request, format="grib")
grib.to_xarray(xarray_open_mfdataset_kwargs=kwargs)
# TypeError: CfGribBackend.open_dataset() got an unexpected keyword argument 'preprocess'
Version
0.7.0
Platform (OS and architecture)
Linux eqc-quality-tools.eqc.compute.cci1.ecmwf.int 5.14.0-362.8.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 8 17:36:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Relevant log output
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[15], line 16
13 nc.to_xarray(xarray_open_mfdataset_kwargs=kwargs) # OK
15 grib = earthkit.data.from_source("cds", collection_id, **request, format="grib")
---> 16 grib.to_xarray(xarray_open_mfdataset_kwargs=kwargs)
17 # TypeError: CfGribBackend.open_dataset() got an unexpected keyword argument 'preprocess'
File /data/common/miniforge3/envs/wp3/lib/python3.11/site-packages/earthkit/data/readers/grib/xarray.py:138, in XarrayMixIn.to_xarray(self, **kwargs)
125 default.update(self.xarray_open_dataset_kwargs())
127 xarray_open_dataset_kwargs.update(
128 Kwargs(
129 user=user_xarray_open_dataset_kwargs,
(...)
135 )
136 )
--> 138 result = xr.open_dataset(
139 IndexWrapperForCfGrib(self, ignore_keys=ignore_keys),
140 **xarray_open_dataset_kwargs,
141 )
143 return result
File /data/common/miniforge3/envs/wp3/lib/python3.11/site-packages/xarray/backends/api.py:573, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
561 decoders = _resolve_decoders_kwargs(
562 decode_cf,
563 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...)
569 decode_coords=decode_coords,
570 )
572 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 573 backend_ds = backend.open_dataset(
574 filename_or_obj,
575 drop_variables=drop_variables,
576 **decoders,
577 **kwargs,
578 )
579 ds = _dataset_from_backend_dataset(
580 backend_ds,
581 filename_or_obj,
(...)
591 **kwargs,
592 )
593 return ds
TypeError: CfGribBackend.open_dataset() got an unexpected keyword argument 'preprocess'
Accompanying data
No response
Organisation
B-Open / CADS-EQC
@malmans2, thank you for reporting this issue. I agree that using xarray_open_mfdataset
consistently would be a good idea. This will be fixed in the next release.
Also related to this issue is the following comment from @malmans2 in #375:
just wanted to provide more details about the use we are doing as you mentioned that we should not import the reader class and a new method will be added:
if isinstance(earthkit_ds, GRIBReader):
xr_ds = earthkit_ds.to_xarray(xarray_open_dataset_kwargs={"squeeze": False, "chunks": {}})
elif isinstance(earthkit_ds, CSVReader):
xr_ds = ds.to_xarray(pandas_read_csv_kwargs=...)
elif ...:
...
else:
xr_ds = earthkit_ds.to_xarray(xarray_open_mfdataset_kwargs=...)