ecmwf/cfgrib

failed to set key 'endStep' in to_grib

jiaobf opened this issue · 7 comments

I read the temperature from GFS and tried to write the data to a new GRIB file .

from eccodes import *
import numpy as np
import xarray as xr 
from cfgrib.xarray_to_grib import to_grib

filename = "gfs.2022010500f24"
ds = xr.open_dataset(filename, engine='cfgrib',
                     backend_kwargs={'filter_by_keys': {'typeOfLevel': 'isobaricInhPa', 
                                                        'name'       : 'Temperature'}})

to_grib(ds, 'out1.grib', grib_keys={'edition': 2})

Running the script directly will result in an error

KeyError: 1641340800000000000

following #272 , it works with a warning :

failed to set key 'endStep' to numpy.timedelta64(24,'ns')

and the time information in the grib file is incorrect

1:0:d=2022010500:TMP Temperature [K]:1000 mb:anl:

which should be

542:117403445:d=2022010500:TMP Temperature [K]:1000 mb:24 hour fcst:

Environment:
Python 3.8 from conda-forge on centos7

pandas 1.3.4
cfgrib 0.9.9.1
xarray 0.20.1

Hello, where could we find the gfs.2022010500f24 file ?

Hello, where could we find the gfs.2022010500f24 file ?

Thank you for your attention. GFS file can be download with

wget https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20220228/00/atmos/gfs.t00z.pgrb2.0p50.f024

in which gfs.YYYYMMDD/00 can be set to any day in the past week.

This should be fixed with the release of the next eccodes-python, which now accepts numpy.int64 for setting keys: https://github.com/ecmwf/eccodes-python/commits/develop You can either try the develop branch of eccodes-python, or wait for the release, which should be in a couple of days.

As mentioned in this issue #267 (but things changed a bit since ...) it fails when pandas function encounters timestamp (or timedelta) int/long values.

You can fix it (waiting for the new version of cfgrib) by updating the convert_label_indexer function in xarray/core/indexing.py file and adding the few lines as illustrated below (with # ADDED LINE comment) :

`def convert_label_indexer(index, label, index_name="", method=None, tolerance=None):
"""Given a pandas.Index and labels (e.g., from getitem) for one
dimension, return an indexer suitable for indexing an ndarray along that
dimension. If 'index' is a pandas.MultiIndex and depending on 'label',
return a new pandas.Index or pandas.MultiIndex (otherwise return None).
"""
new_index = None

if isinstance(label, slice):
    if method is not None or tolerance is not None:
        raise NotImplementedError(
            "cannot use ''method'' argument if any indexers are slice objects"
        )
    indexer = index.slice_indexer(
        _sanitize_slice_element(label.start),
        _sanitize_slice_element(label.stop),
        _sanitize_slice_element(label.step),
    )
    if not isinstance(indexer, slice):
        # unlike pandas, in xarray we never want to silently convert a
        # slice indexer into an array indexer
        raise KeyError(
            "cannot represent labeled-based slice indexer for dimension "
            f"{index_name!r} with a slice over integer positions; the index is "
            "unsorted or non-unique"
        )

elif is_dict_like(label):
    is_nested_vals = _is_nested_tuple(tuple(label.values()))
    if not isinstance(index, pd.MultiIndex):
        raise ValueError(
            "cannot use a dict-like object for selection on "
            "a dimension that does not have a MultiIndex"
        )
    elif len(label) == index.nlevels and not is_nested_vals:
        indexer = index.get_loc(tuple(label[k] for k in index.names))
    else:
        for k, v in label.items():
            # index should be an item (i.e. Hashable) not an array-like
            if isinstance(v, Sequence) and not isinstance(v, str):
                raise ValueError(
                    "Vectorized selection is not "
                    "available along level variable: " + k
                )
        indexer, new_index = index.get_loc_level(
            tuple(label.values()), level=tuple(label.keys())
        )

        # GH2619. Raise a KeyError if nothing is chosen
        if indexer.dtype.kind == "b" and indexer.sum() == 0:
            raise KeyError(f"{label} not found")

elif isinstance(label, tuple) and isinstance(index, pd.MultiIndex):
    if _is_nested_tuple(label):
        indexer = index.get_locs(label)
    elif len(label) == index.nlevels:
        indexer = index.get_loc(label)
    else:
        indexer, new_index = index.get_loc_level(
            label, level=list(range(len(label)))
        )
else:
    label = (
        label
        if getattr(label, "ndim", 1) > 1  # vectorized-indexing
        else _asarray_tuplesafe(label)
    )
    if label.ndim == 0:
        # see https://github.com/pydata/xarray/pull/4292 for details
        label_value = label[()] if label.dtype.kind in "mM" else label.item()
        if isinstance(index, pd.MultiIndex):
            indexer, new_index = index.get_loc_level(label_value, level=0)
        elif isinstance(index, pd.CategoricalIndex):
            if method is not None:
                raise ValueError(
                    "'method' is not a valid kwarg when indexing using a CategoricalIndex."
                )
            if tolerance is not None:
                raise ValueError(
                    "'tolerance' is not a valid kwarg when indexing using a CategoricalIndex."
                )
            indexer = index.get_loc(label_value)
        else:
            if isinstance(index, pd.DatetimeIndex): # ADDED LINE
                indexer = index.get_loc(pd.Timestamp(label_value), method=method, tolerance=tolerance)  # ADDED LINE
            elif isinstance(index, pd.TimedeltaIndex):  # ADDED LINE
                indexer = index.get_loc(pd.Timedelta(label_value), method=method, tolerance=tolerance)  # ADDED LINE
            else:  # ADDED LINE
                indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
    elif label.dtype.kind == "b":
        indexer = label
    else:
        if isinstance(index, pd.MultiIndex) and label.ndim > 1:
            raise ValueError(
                "Vectorized selection is not available along "
                "MultiIndex variable: " + index_name
            )
        indexer = get_indexer_nd(index, label, method, tolerance)
        if np.any(indexer < 0):
            raise KeyError(f"not all values found in index {index_name!r}")
return indexer, new_index`

This should be fixed with the release of the next eccodes-python, which now accepts numpy.int64 for setting keys: https://github.com/ecmwf/eccodes-python/commits/develop You can either try the develop branch of eccodes-python, or wait for the release, which should be in a couple of days.

I manually modified line 973 and line 2063 of gribapi/gribapi.py , but it didn't seem to work.

Also, make sure you're using cfgrib from the master branch - this fix was important but is not yet in a release: 3cc01e3

Also, make sure you're using cfgrib from the master branch - this fix was important but is not yet in a release: 3cc01e3

It works!