Cfgrib only reads some variables
Opened this issue · 7 comments
Hello,
I am trying to open some HRRR grib files to extract smoke forecast information. The files are available on NOMADS or GCP and should all be the same regardless of validtime https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/. I'm working with the conus wrfnat files.
I'm opening these files with cfgrib and 'filter_by_keys':
import xarray as xr
ds = xr.open_dataset('hrrr.t00z.wrfnatf12.grib2',engine='cfgrib',filter_by_keys={'typeOfLevel': 'hybrid'})
This produces a nice xarray dataset:
<xarray.Dataset>
Dimensions: (hybrid: 50, x: 1799, y: 1059)
Coordinates:
time datetime64[ns] ...
step timedelta64[ns] ...
* hybrid (hybrid) float64 1.0 2.0 3.0 4.0 5.0 ... 47.0 48.0 49.0 50.0
latitude (y, x) float64 ...
longitude (y, x) float64 ...
valid_time datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables: (12/16)
pres (hybrid, y, x) float32 ...
clwmr (hybrid, y, x) float32 ...
unknown (hybrid, y, x) float32 ...
rwmr (hybrid, y, x) float32 ...
snmr (hybrid, y, x) float32 ...
grle (hybrid, y, x) float32 ...
... ...
t (hybrid, y, x) float32 ...
q (hybrid, y, x) float32 ...
u (hybrid, y, x) float32 ...
v (hybrid, y, x) float32 ...
w (hybrid, y, x) float32 ...
tke (hybrid, y, x) float32 ...
but with only 16 of the 20 variables these files are supposed to contain (see full list from NOAA: https://rapidrefresh.noaa.gov/hrrr/HRRRv4_GRIB2_WRFNAT.txt)
Unfortunately one of these missing four happens to by my sought-after smoke.
This issue looks similar (ish) to #66 #139 #45 #217
but there are no warning messages thrown and I'm not sure it's an issue with the multi-field message because cfgrib's parsing worked for 16/20 variables rather than reducing down to just one... any ideas would be much appreciated!
Hello @jsillin ,
It looks to me like some of your fields are not known by ecCodes (the GRIB engine behind cfgrib). You have a variable called 'unknown' because ecCodes could not identify it, and my first suspicion is that there are five such variables - all will be called 'unknown' and therefore put into the same variable. You can check on the command-line with grib_ls <gribfile>
to see what ecCodes understands from the file. You will probably need to obtain or create local ecCodes tables for this data, although it is strange that some variables are understood.
Cheers,
Iain
I've got a similar issue but I am getting an error. I'm working with RAP data.
The error message:
skipping variable: paramId==165 shortName='u10'
Traceback (most recent call last):
File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 653, in build_dataset_components
dict_merge(variables, coord_vars)
File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 584, in dict_merge
raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2.0) new_value=Variable(dimensions=(), data=10.0)
skipping variable: paramId==166 shortName='v10'
Traceback (most recent call last):
File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 653, in build_dataset_components
dict_merge(variables, coord_vars)
File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 584, in dict_merge
raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2.0) new_value=Variable(dimensions=(), data=10.0)
Here's the output of running grib_ls <gribfile>
, as suggested. Everything looks normal to me:
rap.grib2
edition centre date dataType gridType stepRange typeOfLevel level shortName packingType
2 kwbc 20211104 fc lambert 0 isobaricInhPa 250 gh grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 250 t grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 250 r grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 250 u grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 250 v grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 500 gh grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 500 t grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 500 r grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 500 u grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 500 v grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 700 gh grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 700 t grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 700 r grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 700 u grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 700 v grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 850 gh grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 850 t grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 850 r grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 850 u grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 850 v grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 925 gh grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 925 t grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 925 r grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 925 u grid_jpeg
2 kwbc 20211104 fc lambert 0 isobaricInhPa 925 v grid_jpeg
2 kwbc 20211104 fc lambert 0 meanSea 0 mslma grid_jpeg
2 kwbc 20211104 fc lambert 0 heightAboveGround 2 2t grid_jpeg
2 kwbc 20211104 fc lambert 0 heightAboveGround 2 2d grid_jpeg
2 kwbc 20211104 fc lambert 0 heightAboveGround 10 10u grid_jpeg
2 kwbc 20211104 fc lambert 0 heightAboveGround 10 10v grid_jpeg
30 of 30 messages in rap.grib2
30 of 30 total messages in 1 files
Hi @karlwx, the problem here is that you have different level types (pressure, meanSea and heightAboveGround). These are not compatible in a single xarray dataset (you could easily have the same height level value as a valid pressure level value, e.g. 100, and then the vertical coordinates would get confused). In short, you need to separate your data using the filtering facility described in the readme, e.g.
xr.open_dataset('nam.t00z.awp21100.tm00.grib2', engine='cfgrib',
backend_kwargs={'filter_by_keys': {'typeOfLevel': 'heightAboveGround'}})
You would need to do this for each typeOfLevel, and therefore end up with one xarray dataset for each. I hope that makes sense!
Iain
In short, you need to separate your data using the filtering facility described in the readme
Hi @iainrussell ,
Unfortunately, this does not seem to be the only problem here. I've adjusted my code based on your suggestions and still run into the same error.
ds = xr.open_dataset('./data/rap.2021112900.grib2', engine='cfgrib',
backend_kwargs={'filter_by_keys': {'typeOfLevel': 'heightAboveGround'}})
skipping variable: paramId==167 shortName='t2m'
Traceback (most recent call last):
File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 653, in build_dataset_components
dict_merge(variables, coord_vars)
File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 584, in dict_merge
raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=('heightAboveGround',), data=array([1000., 4000.])) new_value=Variable(dimensions=(), data=2.0)
I'm wondering if this is specifically related to the RAP data I'm using (you can download a file to test from here: https://nomads.ncep.noaa.gov/pub/data/nccf/com/rap/prod/
Also, I'm wondering if there is a way to list all the different values of typeOfLevel present in the grib file? This would make it much easier to anticipate any issues and build code to open each as a separate dataset.
Did you solve it? I am having the same problem. Loading the definitions show the correct output with grib_ls, but of some reason some of the variables are not correctly interpreted by xarray and thus thrown into 'unknown'..
Hi @madsobdrupjakobsen, if you are getting 'unknown' parameters, have a look at #230.
Hey, I had the same problem and I noticed that in the output of grib_ls
the shortName
field is 10u
.
I tried adding that to the filters, and it works.
So something like this will do the trick:
grib_ds_10u = xr.open_dataset(grib_filepath, engine="cfgrib", filter_by_keys={'typeOfLevel': 'heightAboveGround', 'shortName': '10u'})
grib_ds_10v = xr.open_dataset(grib_filepath, engine="cfgrib", filter_by_keys={'typeOfLevel': 'heightAboveGround', 'shortName': '10v'})
Just to make things a little more fun, the data variables in the datasets are actually u10
, and v10
, and not 10u
, or 10v
.
This is probably the reason why cfgrib throws this error cfgrib.dataset.DatasetBuildError: key present and new value is different: