Rainbow reader: _FillValue and _Undetect
egouden opened this issue ยท 13 comments
Description
In the Rainbow reader, _FillValue is set to 0 and _Undetect is not set. This is causing a loss of information.
Normally _Fillvalue should be set to 255 (or the max value for the byte encoding) and _Undetect to 0.
WMO CF Extensions:
Thanks for letting us know! Contributions are always welcome :)
Sorry for the late response, that went somehow under the radar.
Rainbow5 (at least Release 5.57.0) does only have a notion of "no data" which is best translated to _FillValue
(it's zero for all radar moments on disk). There is no _Undetect
comparable value in Rainbow5.
For instance a one byte reflectivity looks like this:
0x00 (0) no data
0x01 (1) -31.5
0x02 (2) -31.0
0x40 (64) 0.0
0x80 (128) 32.0
0xC0 (192) 64.0
0xFE (254) 95.0
0xFF (255) 95.5
We can't do much about the source format.
And I'm not considering that a bug, as specifying the _FillValue
is sufficient for the CfRadial2 standard. ODIM_H5 has nodata
as well as undetect
mentioned, but if I interpret the documentation correctly both are not mandatory values.
Ahh okay! Sorry for the mislabelling here @kmuehlbauer , I appreciate the insight!
No worries @mgrover1, we should invent a label which can be put there instead. How about data-format-issue
?
how about data-standards
?
This is a known tricky issue.
- _Undetect
It is an optional attribute in FM301. It is defined as follow: "Indicates an area (range bin) that has been radiated but has not produced a valid echo".
It is a mandatory metadata in ODIM. It is defined as "Raw value used to denote areas below the measurement detection threshold (radiated but nothing detected)".
By convention, this special kind of value is often coded as zero.
In the documentation of Rainbow it is described as "no data". But this is misleading. It actually corresponds to _Undetect. It should not be considered as "missing data" using _FillValue.
- _FillValue
This is defined in CFRADIAL as "Indicates data are missing at this range bin" This has apparently been dropped in FM301 in favor of flag_values.
In ODIM 2.4 you have the similar mandatory (if applicable) NODATA, defined as "raw value used to denote areas void of data (never radiated)". In OPERA, it is also used when bad data have been removed.
There is actually no "missing value" in the Rainbow format. I was wrong here. So _FillValue should simply not be set.
I hope this clarifies my point of view.
@egouden Great explanation. It's definitely confusing. The major pain point here is the netcdf standard which only knows _FillValue
but no nodata or undetect whatsoever. It has a notion of missing_value, though.
So as default, when creating a netcdf variable you can specify a fill_value. If you leave it out, netcdf will take it's default value for that datatype as fill_value but it will not be attached to the variable.
It looks like we have to make sure we use apply the correct metadata etc on writing the different standards.
For UBYTE, the default _FillValue is 255. If this is a valid maximum value for a given format (e.g. Rainbow), one could replace 255 by 254 while setting 255 as _FillValue for clarity.
Note that a 255 valid value is most probably not meteorological. And it might be reset to 255 later by post processing.
This is actually a more general issue on the definition of special values in our data model. Is missing value "undetect" or "no valid data" or both? I think this is not clear in FM301. I will ask the OPERA ODIM expert for clarification.
We will need to be careful here at implementation time, as xarray
maps everything to np.nan
(_FillValue
, missing_value
) but it is not able to tell that apart when writing.
I've also added links to the relevant WMO CF Extensions in the OP above.
This is an excerpt from the WMO CF Extensions. The only mandatory attribute in our context is valid_range
, providing smallest and largest valiud values.
_FillValue
is denoting missing or undefined data and is not mandatory (but it might be good practice to use the netcdf default fill_value here, if no other value is provided).
Based on the this Open Radar Discourse thread it seems that we need to add more metadata to the rainbow reader, something pretty similar to what is in this function:
def add_metadata_to_rainbow(dtree):
data = np.array('axis_z', dtype='|S32')
attributes = {
'standard_name': 'primary_axis_of_rotation',
'options': 'axis_z, axis_y, axis_x'
}
# Create the xarray DataArray
dtree['primary_axis'] = xr.DataArray(data, attrs=attributes, dims=())
fixed_angles = []
for grp in dtree.groups:
if "sweep" in grp:
fixed_angle = dtree[grp]['sweep_fixed_angle'].values
fixed_angles.append(fixed_angle)
dtree[grp]['sweep_mode'] = xr.DataArray(
np.array(dtree[grp]['sweep_mode'], dtype='|S32'),
attrs={'standard_name': 'scan_mode_for_sweep',
'options': 'sector, coplane, rhi, vertical_pointing, idle, azimuth_surveillance, \
elevation_surveillance, sunscan, pointing, calibration, manual_ppi, manual_rhi'})
fixed_angles = np.array(fixed_angles, dtype=np.float32)
sweep_groups = np.array([f'sweep_{i}' for i in range(fixed_angles.size)])
dtree['sweep_fixed_angle'] = xr.DataArray(fixed_angles, dims=('sweep'))
dtree['sweep_group_name'] = xr.DataArray(sweep_groups, dims=('sweep'))
dtree['volume_number'] = xr.DataArray(np.array(0, dtype=np.int16),
attrs={'standard_name':'data_volume_index_number'})
dtree['instrument_type'] = xr.DataArray(np.array('radar', dtype='|S32'),
attrs={'standard_name': 'type_of_instrument',
'options': 'radar, lidar',
'meta_group': 'instrument_parameters'}
)
dtree['platform_type'] = xr.DataArray(
np.array('fixed', dtype='|S32'),
attrs={'standard_name': 'platform_type',
'options':'fixed, vehicle, ship, aircraft_fore, aircraft_aft, aircraft_tail, aircraft_belly,\
aircraft_roof, aircraft_nose, satellite_orbit, satellite_geostat'})
return dtree