ecmwf/cfgrib

Unable to merge multiple Grib files with specified variable name

meteoDaniel opened this issue · 0 comments

What happened?

I am open a list of grib files (arome meteoe france SP2 grib packages ), and when I specify a shortName or the name of the variable, I receive a xarray.MergeError . But when I open multiple variable by just specifying {'stepType': 'instant'} all works fine.

This behaviour is very curious and i do not know how to debug this issue.

What are the steps to reproduce the bug?

In [2]: self.files_per_grib_package[grib_package_to_use]
Out[2]: 
[PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_0.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_1.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_2.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_3.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_4.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_5.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_6.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_7.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_8.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_9.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_10.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_11.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_12.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_13.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_14.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_15.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_16.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_17.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_18.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_19.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_20.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_21.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_22.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_23.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_24.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_25.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_26.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_27.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_28.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_29.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_30.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_31.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_32.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_33.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_34.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_35.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_36.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_37.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_38.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_39.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_40.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_41.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_42.grib2')]

In [3]: data = xarray.open_mfdataset(
   ...:                 self.files_per_grib_package[grib_package_to_use],
   ...:                 engine="cfgrib",
   ...:                 parallel=True,
   ...:                 concat_dim="step",
   ...:                 combine="nested",
   ...:                 backend_kwargs={
   ...:                     "indexpath": "",
   ...:                     "errors": "ignore",
   ...:                     "filter_by_keys":  {'shortName': 'lcc', 'stepType': 'instant'}
   ...:                     # "filter_by_keys": FILTER_ARGUMENT[variable],
   ...:                 },
   ...:             )
---------------------------------------------------------------------------
MergeError                                Traceback (most recent call last)
Cell In[3], line 1
----> 1 data = xarray.open_mfdataset(
      2                 self.files_per_grib_package[grib_package_to_use],
      3                 engine="cfgrib",
      4                 parallel=True,
      5                 concat_dim="step",
      6                 combine="nested",
      7                 backend_kwargs={
      8                     "indexpath": "",
      9                     "errors": "ignore",
     10                     "filter_by_keys":  {'shortName': 'lcc', 'stepType': 'instant'}
     11                     # "filter_by_keys": FILTER_ARGUMENT[variable],
     12                 },
     13             )

File /usr/local/lib/python3.10/site-packages/xarray/backends/api.py:1071, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
   1067 try:
   1068     if combine == "nested":
   1069         # Combined nested list by successive concat and merge operations
   1070         # along each dimension, using structure given by "ids"
-> 1071         combined = _nested_combine(
   1072             datasets,
   1073             concat_dims=concat_dim,
   1074             compat=compat,
   1075             data_vars=data_vars,
   1076             coords=coords,
   1077             ids=ids,
   1078             join=join,
   1079             combine_attrs=combine_attrs,
   1080         )
   1081     elif combine == "by_coords":
   1082         # Redo ordering from coordinates, ignoring how they were ordered
   1083         # previously
   1084         combined = combine_by_coords(
   1085             datasets,
   1086             compat=compat,
   (...)
   1090             combine_attrs=combine_attrs,
   1091         )

File /usr/local/lib/python3.10/site-packages/xarray/core/combine.py:356, in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join, combine_attrs)
    353 _check_shape_tile_ids(combined_ids)
    355 # Apply series of concatenate or merge operations along each dimension
--> 356 combined = _combine_nd(
    357     combined_ids,
    358     concat_dims,
    359     compat=compat,
    360     data_vars=data_vars,
    361     coords=coords,
    362     fill_value=fill_value,
    363     join=join,
    364     combine_attrs=combine_attrs,
    365 )
    366 return combined

File /usr/local/lib/python3.10/site-packages/xarray/core/combine.py:232, in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join, combine_attrs)
    228 # Each iteration of this loop reduces the length of the tile_ids tuples
    229 # by one. It always combines along the first dimension, removing the first
    230 # element of the tuple
    231 for concat_dim in concat_dims:
--> 232     combined_ids = _combine_all_along_first_dim(
    233         combined_ids,
    234         dim=concat_dim,
    235         data_vars=data_vars,
    236         coords=coords,
    237         compat=compat,
    238         fill_value=fill_value,
    239         join=join,
    240         combine_attrs=combine_attrs,
    241     )
    242 (combined_ds,) = combined_ids.values()
    243 return combined_ds

File /usr/local/lib/python3.10/site-packages/xarray/core/combine.py:267, in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join, combine_attrs)
    265     combined_ids = dict(sorted(group))
    266     datasets = combined_ids.values()
--> 267     new_combined_ids[new_id] = _combine_1d(
    268         datasets, dim, compat, data_vars, coords, fill_value, join, combine_attrs
    269     )
    270 return new_combined_ids

File /usr/local/lib/python3.10/site-packages/xarray/core/combine.py:290, in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join, combine_attrs)
    288 if concat_dim is not None:
    289     try:
--> 290         combined = concat(
    291             datasets,
    292             dim=concat_dim,
    293             data_vars=data_vars,
    294             coords=coords,
    295             compat=compat,
    296             fill_value=fill_value,
    297             join=join,
    298             combine_attrs=combine_attrs,
    299         )
    300     except ValueError as err:
    301         if "encountered unexpected variable" in str(err):

File /usr/local/lib/python3.10/site-packages/xarray/core/concat.py:250, in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    238     return _dataarray_concat(
    239         objs,
    240         dim=dim,
   (...)
    247         combine_attrs=combine_attrs,
    248     )
    249 elif isinstance(first_obj, Dataset):
--> 250     return _dataset_concat(
    251         objs,
    252         dim=dim,
    253         data_vars=data_vars,
    254         coords=coords,
    255         compat=compat,
    256         positions=positions,
    257         fill_value=fill_value,
    258         join=join,
    259         combine_attrs=combine_attrs,
    260     )
    261 else:
    262     raise TypeError(
    263         "can only concatenate xarray Dataset and DataArray "
    264         f"objects, got {type(first_obj)}"
    265     )

File /usr/local/lib/python3.10/site-packages/xarray/core/concat.py:524, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    518 if variables_to_merge:
    519     grouped = {
    520         k: v
    521         for k, v in collect_variables_and_indexes(datasets).items()
    522         if k in variables_to_merge
    523     }
--> 524     merged_vars, merged_indexes = merge_collected(
    525         grouped, compat=compat, equals=equals
    526     )
    527     result_vars.update(merged_vars)
    528     result_indexes.update(merged_indexes)

File /usr/local/lib/python3.10/site-packages/xarray/core/merge.py:290, in merge_collected(grouped, prioritized, compat, combine_attrs, equals)
    288 variables = [variable for variable, _ in elements_list]
    289 try:
--> 290     merged_vars[name] = unique_variable(
    291         name, variables, compat, equals.get(name, None)
    292     )
    293 except MergeError:
    294     if compat != "minimal":
    295         # we need more than "minimal" compatibility (for which
    296         # we drop conflicting coordinates)

File /usr/local/lib/python3.10/site-packages/xarray/core/merge.py:144, in unique_variable(name, variables, compat, equals)
    141                 break
    143 if not equals:
--> 144     raise MergeError(
    145         f"conflicting values for variable {name!r} on objects to be combined. "
    146         "You can skip this check by specifying compat='override'."
    147     )
    149 if combine_method:
    150     for var in variables[1:]:

MergeError: conflicting values for variable 'valid_time' on objects to be combined. You can skip this check by specifying compat='override'.

In [4]: data = xarray.open_mfdataset(
   ...:                 self.files_per_grib_package[grib_package_to_use],
   ...:                 engine="cfgrib",
   ...:                 parallel=True,
   ...:                 concat_dim="step",
   ...:                 combine="nested",
   ...:                 backend_kwargs={
   ...:                     "indexpath": "",
   ...:                     "errors": "ignore",
   ...:                     "filter_by_keys":  {'stepType': 'instant'}
   ...:                     # "filter_by_keys": FILTER_ARGUMENT[variable],
   ...:                 },
   ...:             )

In [5]: data
Out[5]: 
<xarray.Dataset> Size: 5GB
Dimensions:     (step: 43, latitude: 1791, longitude: 2801)
Coordinates:
    time        datetime64[ns] 8B 2024-06-12
  * step        (step) timedelta64[ns] 344B 00:00:00 ... 1 days 18:00:00
    surface     float64 8B 0.0
  * latitude    (latitude) float64 14kB 55.4 55.39 55.38 ... 37.52 37.51 37.5
  * longitude   (longitude) float64 22kB -12.0 -11.99 -11.98 ... 15.99 16.0
    valid_time  (step) datetime64[ns] 344B 2024-06-12 ... 2024-06-13T18:00:00
    level       float64 8B 0.0
Data variables:
    sp          (step, latitude, longitude) float32 863MB dask.array<chunksize=(1, 1791, 2801), meta=np.ndarray>
    CAPE_INS    (step, latitude, longitude) float32 863MB dask.array<chunksize=(1, 1791, 2801), meta=np.ndarray>
    lcc         (step, latitude, longitude) float32 863MB dask.array<chunksize=(2, 1791, 2801), meta=np.ndarray>
    hcc         (step, latitude, longitude) float32 863MB dask.array<chunksize=(2, 1791, 2801), meta=np.ndarray>
    mcc         (step, latitude, longitude) float32 863MB dask.array<chunksize=(2, 1791, 2801), meta=np.ndarray>
    unknown     (step, latitude, longitude) float32 863MB dask.array<chunksize=(2, 1791, 2801), meta=np.ndarray>
Attributes:
    GRIB_edition:            2
    GRIB_centre:             lfpw
    GRIB_centreDescription:  French Weather Service - Toulouse
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             French Weather Service - Toulouse
    history:                 2024-06-12T05:53 GRIB to CDM+CF via cfgrib-0.9.1...

In [6]:

Version

0.9.12.0

Platform (OS and architecture)

python3.10-slim Docker image

Relevant log output

No response

Accompanying data

https://mf-models-on-aws.org/#arome-france-hd/v1/2024-06-12/00/SP2/

Organisation

No response