CF-Checker Plugin: Recognize variables with standard_names that enforce an additional dimension
Opened this issue · 2 comments
There are some CF standard names that need an additional dimension beyond the temporal and spatial dimensions. These standard names are of one of these types (see Guidelines for Construction of CF Standard Names):
Rule | Units | Meaning |
---|---|---|
... | ... | ... |
histogram_of_X[_over_Z] | 1 | histogram (i.e. number of counts for each range of X) of variations (over Z) of X. The data variable should have an axis for X. |
integral_of_Y_wrt_X | [X]*[Y] | int Y dX. The data variable should have an axis for X specifying the limits of the integral as bounds. |
... | ... | ... |
probability_distribution_of_X[_over_Z] | 1 | probability distribution (i.e. a number in the range 0.0-1.0 for each range of X) of variations (over Z) of X. The data variable should have an axis for X. |
probability_density_function_of_X[_over_Z] | 1/[X] | PDF for variations (over Z) of X. The data variable should have an axis for X. |
... | ... | ... |
An examples header using one of these standard names is this one:
netcdf test {
dimensions:
time = UNLIMITED ; // (248 currently)
lon = 5 ;
lat = 5 ;
mlev = 4 ;
column = 2 ;
variables:
double time(time) ;
time:standard_name = "time" ;
time:units = "days since 1999-10-01 00:00:00" ;
time:calendar = "proleptic_gregorian" ;
time:axis = "T" ;
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
double lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double mlev(mlev) ;
mlev:long_name = "level number" ;
mlev:units = "1" ;
mlev:axis = "Z" ;
mlev:positive = "down" ;
float histogram_of_column_over_some_parameter(time, mlev, lat, lon, column) ;
histogram_of_column_over_some_parameter:long_name = "Cloud type (subcolumn)" ;
histogram_of_column_over_some_parameter:units = "1" ;
// global attributes:
:Conventions = "CF-1.7" ;
:history = "test" ;
:title = "test" ;
If we provide this header to the IOOS Compliance Checker CF Plugin we get:
--------------------------------------------------------------------------------
IOOS Compliance Checker Report
cf:1.7
http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html
--------------------------------------------------------------------------------
Corrective Actions
histogram_bad_feature_error_2.cdl has 2 potential issues
Errors
--------------------------------------------------------------------------------
§9.1 Features and feature types
* Unidentifiable feature for variable histogram_of_column_over_some_parameter
Warnings
--------------------------------------------------------------------------------
§2.4 Dimensions
* histogram_of_column_over_some_parameter's dimensions are not in the recommended order T, Z, Y, X. They are time (Unlimited), mlev, lat, lon, column
I don't have a fix yet. If I find some time in the beginning of next week, I will try to provide a PR. After that I will not be available for two month.
The CF Conventions just recommend (2.4. Dimensions):
All other dimensions should, whenever possible, be placed to the left of the spatiotemporal dimensions.
However, in the IOOS CC CF Plugin this is check as requirement and not as recommendation (regx = regex.compile(r"^[^TZYX]*T?Z?Y?X?$")
):
def _dims_in_order(self, dimension_order):
"""
:param list dimension_order: A list of axes
:rtype: bool
:return: Returns True if the dimensions are in order U*, T, Z, Y, X,
False otherwise
"""
regx = regex.compile(r"^[^TZYX]*T?Z?Y?X?$")
dimension_string = "".join(dimension_order)
return regx.match(dimension_string) is not None
I would suggest to check for "^[^TZYX]*T?Z?Y?X?[^TZYX]*$"
. However, this might lead to some new issues as some errors might not be captured anymore: if spatio-temporal dimensions are not recognized as such, they might be given in any order.
Correction. The additional dimension of histogram_
standard names is suggested to be left of TZYX as this example from a mailing list post indicates (by Jonathan Gregory, Fri Oct 14 03:27:22 MDT 2016).
// source variable
float tair(time,altitude,latitude,longitude);
tair:units="K";
tair:standard_name="air_temperature";
tair:cell_methods="altitude: mean area: mean time: mean";
// resulting probability_density_function_ variable
float pair(tair,time,altitude);
pair:standard_name="probability_density_function_of_air_temperature";
pair:units="K-1";
pair:cell_methods="altitude: mean time: mean area: sum tair: mean";
pair:coordinates="latitude longitude"; // to record the ranges
This only leaves the taxon names concept of CF 1.8 to be special case as example 6.1.2 suggests:
float abundance(time,taxon) ;
abundance:standard_name = "number_concentration_of_organisms_in_taxon_in_sea_water" ;
abundance:coordinates = "taxon_lsid taxon_name" ;