Reading National Blend of Models (NBM) Data with cfgrib
jlgutenson opened this issue · 6 comments
What happened?
Hello,
I am attempting to read NBM data using xarray and crgrib with the following code snippet:
import xarray as xr
nc_ds = xr.open_dataset(
grib_path,
engine='cfgrib',
backend_kwargs={
'filter_by_keys': {
'stepType': 'accum',
'typeOfLevel': 'surface',
}
}
)
I am expecting this to return a data series containing cumulative surface data (e.g., total precipitation).
I am receiving the following error:
cfgrib.dataset.DatasetBuildError: multiple values for key 'subCentre'
Since I've seen others successfully utilize NBM data with xarray and cfgrib, I'm guessing that this is some sort of versioning issue with cfgrib and/or xarray.
Is there a particular version of cfgrib that effectively reads the current NBM data?
What are the steps to reproduce the bug?
Download a CONUS grib2 file from NOMADS.
Run the following code snippet:
import xarray as xr
grib_path = "/full/path/to/grib2/file"
nc_ds = xr.open_dataset(
grib_path,
engine='cfgrib',
backend_kwargs={
'filter_by_keys': {
'stepType': 'accum',
'typeOfLevel': 'surface',
}
}
)
Version
0.9.10.4
Platform (OS and architecture)
Ubuntu 22.04.6 wsl utilizing Miniconda
Relevant log output
Traceback (most recent call last):
File "/mnt/d/Gutenson_RATES/TWDB-FIF-LRGVDC/2023/Scripts/water_level_obs/scripts/precipitation_zonal_statistics.py", line 174, in <module>
nbm(basin_shapefile, grib_file_directory, precip_csv, basin_precip_data_df=None)
File "/mnt/d/Gutenson_RATES/TWDB-FIF-LRGVDC/2023/Scripts/water_level_obs/scripts/precipitation_zonal_statistics.py", line 108, in nbm
nc_ds = xr.open_dataset(
File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/xarray/backends/api.py", line 572, in open_dataset
backend_ds = backend.open_dataset(
File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/xarray_plugin.py", line 108, in open_dataset
store = CfGribDataStore(
File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/xarray_plugin.py", line 40, in __init__
self.ds = opener(filename, **backend_kwargs)
File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/dataset.py", line 784, in open_file
return open_from_index(index, read_keys, time_dims, extra_coords, **kwargs)
File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/dataset.py", line 726, in open_from_index
dimensions, variables, attributes, encoding = build_dataset_components(
File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/dataset.py", line 695, in build_dataset_components
attributes = build_dataset_attributes(index, filter_by_keys, encoding)
File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/dataset.py", line 619, in build_dataset_attributes
attributes = enforce_unique_attributes(index, GLOBAL_ATTRIBUTES_KEYS, filter_by_keys)
File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/dataset.py", line 277, in enforce_unique_attributes
raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
cfgrib.dataset.DatasetBuildError: multiple values for key 'subCentre'
Accompanying data
https://nomads.ncep.noaa.gov/pub/data/nccf/com/blend/prod/
Organisation
RATES, Inc.
Hi @jlgutenson, there are a lot of files in those folders, which have quite different contents! Could you be more specific about which one you tried so that I can try to replicate it please? (I tried one file, but it was ok.)
Thanks,
Iain
Hi @iainrussell,
Sure thing, apologies for being too vague.
Here is one of the files I'm attempting to use:
Thanks,
Joseph
Hi @jlgutenson,
Thanks for that. In fact, that particular file still did not show the problem, but after trying a few, I found that blend.t00z.core.f018.co.grib2 did. The problem comes from a behaviour of cfgrib that is debatable regarding its usefulness, but in terms of the GRIB file itself, it has 'accum' parameters for both subCentres 9 and 14 from NCEP. cfgrib would like to construct an xarray with some global attributes including subCentre, and thus it should be unique among all the GRIB messages that it reads. We might want to relax that restriction, although it does have a purpose. What you can do is add another filter to use only subCentre 14 (subCentre 9 seems to be test data), so your code would now look like this:
nc_ds = xr.open_dataset(
grib_path,
engine='cfgrib',
backend_kwargs={
'filter_by_keys': {
'stepType': 'accum',
'typeOfLevel': 'surface',
'subCentre': 14
}
}
)
That works for me - is that solution acceptable for you?
Cheers,
Iain
Hey @iainrussell,
Thank you! That seems to be doing the trick.
How were you able to discern which subCentre is which?
Thanks,
Joseph
Well, in fact, looking closer, I may have been mistaken because I did not look at all the messages. But I used Metview's GRIB Examiner to inspect the data (CodesUI is also available, and is basically Metview's examiner packaged as a standalone tool), and it seemed that all the fields from subCentre 8 also had productionStatusOfProcessedData=1 (Operational test products) - see screenshots. However, on closer inspection I can see that some of them are in fact operational - I extrapolated too soon! But at least in this case, all the fields for subCentre=8 are instant rather than 'accum'.
If you really want to make sure you've got everything, this will load the data into multiple datasets without you having to specify the subCentre, and with this I noticed there is also one field from subCentre=9, although it is also an operational test product.
import cfgrib
grib_path = "blend.t00z.core.f018.co.grib2"
nc_ds = cfgrib.open_datasets(
grib_path,
engine='cfgrib',
backend_kwargs={
'filter_by_keys': {
'stepType': 'accum',
'typeOfLevel': 'surface',
}
}
)
print(nc_ds)
Hey @iainrussell,
Thank you! I wasn't familiar with MetView. I've been using Panoply. Everything appears to working on my end. I'll close this "bug" out. Enjoy your weekend!
Thanks again,
Joseph Gutenson