Unable to retrieve all variables with cfgrib
ChristopheLRTE opened this issue ยท 16 comments
Hello,
I'm trying to handle a Grib2 file from which I can extract all the data using directly the command 'grib_get_data -s stepUnits=0 out.grb'.
So it seems I don't need all stepUnits.
In the other hand, when using cfgrib it is impossible to read it fully. Here is the command I tried first :
import cfgrib
cfgrib.open_datasets('out.grb')
And here is the result :
<ipython-input-36-ccea25a1a3be> in <module>
1 import cfgrib
----> 2 cfgrib.open_datasets('out.grb')
~/.local/lib/python3.6/site-packages/cfgrib/xarray_store.py in open_datasets(path, backend_kwargs, **kwargs)
100 backend_kwargs = backend_kwargs.copy()
101 backend_kwargs["squeeze"] = False
--> 102 datasets = open_variable_datasets(path, backend_kwargs=backend_kwargs, **kwargs)
103
104 type_of_level_datasets = {} # type: T.Dict[str, T.List[xr.Dataset]]
~/.local/lib/python3.6/site-packages/cfgrib/xarray_store.py in open_variable_datasets(path, backend_kwargs, **kwargs)
88 bk["filter_by_keys"] = backend_kwargs.get("filter_by_keys", {}).copy()
89 bk["filter_by_keys"]["paramId"] = param_id
---> 90 datasets.extend(raw_open_datasets(path, bk, **kwargs))
91 return datasets
92
~/.local/lib/python3.6/site-packages/cfgrib/xarray_store.py in raw_open_datasets(path, backend_kwargs, **kwargs)
63 datasets = []
64 try:
---> 65 datasets.append(open_dataset(path, backend_kwargs=backend_kwargs, **kwargs))
66 except DatasetBuildError as ex:
67 fbks.extend(ex.args[2])
~/.local/lib/python3.6/site-packages/cfgrib/xarray_store.py in open_dataset(path, **kwargs)
36 raise ValueError("only engine=='cfgrib' is supported")
37 kwargs["engine"] = "cfgrib"
---> 38 return xr.open_dataset(path, **kwargs) # type: ignore
39
40
~/.local/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
570
571 opener = _get_backend_cls(engine)
--> 572 store = opener(filename_or_obj, **extra_kwargs, **backend_kwargs)
573
574 with close_on_error(store):
~/.local/lib/python3.6/site-packages/xarray/backends/cfgrib_.py in __init__(self, filename, lock, **backend_kwargs)
43 lock = ECCODES_LOCK
44 self.lock = ensure_lock(lock)
---> 45 self.ds = cfgrib.open_file(filename, **backend_kwargs)
46
47 def open_store_variable(self, name, var):
~/.local/lib/python3.6/site-packages/cfgrib/dataset.py in open_file(path, grib_errors, indexpath, filter_by_keys, read_keys, time_dims, extra_coords, **kwargs)
718 return Dataset(
719 *build_dataset_components(
--> 720 index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
721 )
722 )
~/.local/lib/python3.6/site-packages/cfgrib/dataset.py in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
639 read_keys=read_keys,
640 time_dims=time_dims,
--> 641 extra_coords=extra_coords,
642 )
643 except DatasetBuildError as ex:
~/.local/lib/python3.6/site-packages/cfgrib/dataset.py in build_variable_components(index, encode_cf, filter_by_keys, log, errors, squeeze, read_keys, time_dims, extra_coords)
499 }
500 attributes.update(COORD_ATTRS.get(coord_name, {}).copy())
--> 501 data = np.array(sorted(values, reverse=attributes.get("stored_direction") == "decreasing"))
502 dimensions = (coord_name,) # type: T.Tuple[str, ...]
503 if squeeze and len(values) == 1:
TypeError: '<' not supported between instances of 'float' and 'str'
After investigating, I tried to filter like that :
cfgrib.open_datasets('out.grb', backend_kwargs={'filter_by_keys': {'stepUnits': 0}})
It works, BUT I do not retrieve all data. Then I tried with :
cfgrib.open_datasets('out.grb', backend_kwargs={'filter_by_keys': {'stepUnits': 1}})
And it fails with the same above error.
So my question is why I would need to get stepUnits=1 (which seems to generate en error) with cfgrib while with eccodes binary grib_get_data there is no problem ?
And how could I do to use cfgrib without making distinction with stepUnits please?
Thanks for your help,
Christophe
Hello @ChristopheLRTE,
Although I don't have a mixed step type file here to test with, I think the problem is that stepType's 'native type' in ecCodes is string, not a number. And when cfgrib is performing its filtering, it is extracting GRIB keys in their native type and then comparing against your value of zero (see the error message about comparing str with float). My guess is that {'stepUnits': 'instant'}
should work. You can check the strings via
grib_ls -p stepType <gribfile>
Good luck!
Iain
Hello @iainrussell,
You point to the problem which I simply cannot solve in my opinion. It is precisely on messages for which the stepType is 'accum' (and not 'instant') that there is a problem :
$ grib_ls -p shortName,stepType,stepUnits -w shortName=ssrd out.grb
out.grb
shortName stepType stepUnits
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
ssrd accum 1
24 of 101 messages in out.grb
24 of 101 total messages in 1 files
$ grib_ls -p stepRange,endStep -w shortName=ssrd out.grb
out.grb
stepRange endStep
ECCODES ERROR : unable to convert endStep in stepUnits
ECCODES ERROR : unable to get endStep as long (Wrong units for step (step must be integer))
Wrong units for step (step must be integer) stepRange
BUT IF I force stepUnits to 0, it's ok and I get this :
[infoger@PF9SOODESS431 arrow]$ grib_ls -s stepUnits=0 -p stepRange,endStep -w shortName=ssrd out.grb
out.grb
stepRange endStep
0-15 15
0-30 30
0-45 45
0-60 60
0-75 75
0-90 90
0-105 105
0-120 120
0-135 135
0-150 150
0-165 165
0-180 180
0-195 195
0-210 210
0-225 225
0-240 240
0-255 255
0-270 270
0-285 285
0-300 300
0-315 315
0-330 330
0-345 345
0-360 360
24 of 101 messages in out.grb
24 of 101 total messages in 1 files
I would like to translate this behavior with 'cfgrib' rather than with 'grib_ls', 'grib_get', 'grib_get_data', ...
I don't really know if I'm clear :-)
Hi @ChristopheLRTE ,
Would you be able to attach a small subset of this GRIB file (or the whole thing if it's not very large) please? I think we will need to have a closer look!
Cheers,
Iain
Here it is (only two step from the original) : out.zip
Thanks for your help,
Christophe
By the way stepUnits = 0 is the same as "minutes". You can use "m" which is more readable
E.g.
% grib_ls -j -s stepUnits=m -ntime out.grb
{
"dataDate": 20211124,
"dataTime": "0000",
"stepType": "accum",
"stepUnits": "m",
"startStep": 0,
"endStep": 15,
"stepRange": "0-15",
"validityDate": 20211124,
"validityTime": 15
},
{
"dataDate": 20211124,
"dataTime": "0000",
"stepType": "accum",
"stepUnits": "m",
"startStep": 0,
"endStep": 30,
"stepRange": "0-30",
"validityDate": 20211124,
"validityTime": 30
}
Thanks @shahramn, but I would like to use cfgrib instead of grib_ls or grib_get_data, etc.
Moreover, cfgrib doesn't support (as far as I know ...) forcing reading grib with stepUnits to 0 (just filtering on it with filter_by_key,s but I do not retrieve all data)
For @shahramn (but not on the weekend!!) - it's true that there's something dodgy here, whether it's the GRIB file itself, or ecCodes. Download the GRIB file attached to this issue and see:
grib_ls -pstartStep out.grb
out.grb
startStep
0
0
2 of 2 messages in out.grb
grib_ls -pendStep out.grb
out.grb
endStep
ECCODES ERROR : unable to convert endStep in stepUnits
Wrong units for step (step must be integer) endStep
For @iainrussell
The startStep and endStep keys are worked out from the stepRange key. For the 2 messages here we have:
stepRange = 0-15 (in minutes)
stepRange = 0-30 (in minutes)
And stepRange is made up of: (startStep)-(endStep)
So if you try to evaluate startStep which is just 0, there is no problem because 0 in minutes and hours is the same.
But when it comes to evaluating the endStep it fails to convert 15 mins and 30 mins to hours as an integer. The default value of stepUnits being hours and these keys are always integers (not doubles)
Also see here: https://confluence.ecmwf.int/display/UDOC/What+is+the+GRIB+key+stepUnits+-+ecCodes+GRIB+FAQ
I hope this helps.
Thanks Shahram, good observation. In this case, I'm not so sure that we can solve it in cfgrib without something additional in the API so that users can specify the units, or maybe we can do it with some extra intelligence when cfgrib grabs the step from the data. In either case, I don't see a way to do it today without a bit of development in cfgrib.
Hello everyone,
Inside my local messages.py cfgrib
Python file, if I modified the from_filestream()
method from FileIndex
class like that, it works fine for me (it could be a parameter we could pass ?) :
@classmethod
def from_filestream(cls, filestream, index_keys):
# type: (FileStream, T.Sequence[str]) -> FileIndex
offsets = {} # type: T.Dict[T.Tuple[T.Any, ...], T.List[OffsetType]]
index_keys = list(index_keys)
count_offsets = {} # type: T.Dict[int, int]
header_values_cache = {} # type: T.Dict[T.Tuple[T.Any, type], T.Any]
for offset_field, message in filestream.items():
# MY UPDATE START
message['stepUnits'] = 0
# MY UPDATE END
header_values = []
for key in index_keys:
try:
value = message[key]
except:
value = "undef"
if isinstance(value, (np.ndarray, list)):
value = tuple(value)
# NOTE: the following ensures that values of the same type that evaluate equal are
# exactly the same object. The optimisation is especially useful for strings and
# it also reduces the on-disk size of the index in a backward compatible way.
value = header_values_cache.setdefault((value, type(value)), value)
header_values.append(value)
offsets.setdefault(tuple(header_values), []).append(offset_field)
self = cls(filestream=filestream, index_keys=index_keys, offsets=list(offsets.items()))
# record the index protocol version in the instance so it is dumped with pickle
return self
Tell me what you think about it please.
Thank you again,
Christophe
Another solution documented here:
https://gist.github.com/erget/dc9ee910a8125af5bf36873c038eb429#file-working-with-gribs-with-stepunits-hours-ipynb
Please give that a try
Marvellous ! thanks !
I had to change the code a bit, but it's great!
#!/usr/bin/bash
def_path=$(codes_info -d | cut -d : -f 2)
expression="s/'stepUnits.table' = defaultStepUnits/'stepUnits.table' = indicatorOfUnitOfTimeRange/g"
# ^^^^^^^^^^^^^^^^
mkdir -p definitions/grib2
templates=$(grep "stepUnits.table" $def_path/grib2/*.def | cut -d':' -f1 | awk -F'/' '{print $NF}')
for template in $templates
do
new_path=definitions/grib2/$template
cp $def_path/grib2/$template $new_path
sed -i -e "$expression" $new_path
done
export ECCODES_DEFINITION_PATH=$(pwd)/definitions:$(codes_info -d)
The result :
(base) python@HP2:~/arome_pi$ grib_ls out.grb
out.grb
edition centre date dataType gridType typeOfLevel level stepRange shortName packingType
2 lfpw 20211124 fc regular_ll surface 0 0-15 ssrd grid_second_order
2 lfpw 20211124 fc regular_ll surface 0 0-30 ssrd grid_second_order
2 of 2 messages in out.grb
2 of 2 total messages in 1 files
I understood the spirit but I did not understand how the templates are chosen by the framework. Would you know ?
Thanks again !
How the templates are chosen by the framework. Would you know ?
Any idea @shahramn, @iainrussell ?
Thanks again,
Christophe
Hi @ChristopheLRTE,
I just wanted to comment on your modification to the code in cfgrib - I was indeed thinking that this could be passed as a backend argument. The thing that will make it complicated is that we are just finalising a big refactoring of the code that makes the GRIB access more abstract, and in the new code it may not be as clear how to get this working nicely in call cases. So in any case, we will not make any changes to the code until the refactoring is done, because merging will be a nightmare! If your fix works for you (and indeed it's a very reasonable fix just to get that data working) then stick with that, or the solution @shahramn mentioned, whichever you prefer, and we hope to get a proper solution in place in due course.
Cheers,
Iain
Thank you for your explanation, I understand perfectly your point of view. Do not break all your work ๐ !
Thank you both again for your help. This issue can be closed for me (just a little explanation about Templates functioning would be welcome ๐).
Have a good day!
Christophe
How the templates are chosen by the framework. Would you know ?
Any idea @shahramn, @iainrussell ?
Thanks again, Christophe
@ChristopheLRTE
Basically the ecCodes library has a set of definition files (which are ASCII) that define the keys of the GRIB message and some rules. Depending on the content of the message (say a given octet) it loads a given file. So for example if we have a GRIB edition 2 message, the files in the directory "definitions/grib2" are loaded. Also for the various sections there are different files defining the keys for that section, so for GRIB2 section 4 (Product Definition Section) we load either the instantaneous or interval-based files. I am simplifying things a lot here to keep it short.
Hope this helps