CF Checker Plugin: extend notification if check for order of dimension fails
Closed this issue · 4 comments
Current situation
Dimensions of data variables are recommended to be ordered as follows:
- spatio-temporal dimensions (e.g. X, Y, Z and T) should be in the order: TZYX
- additional dimensions should be located to the left of the spatio-temporal dimensions
This is recommended in section 2.4 of the CF Conventions:
If any or all of the dimensions of a variable have the interpretations of "date or time" (T), "height or depth" (Z), "latitude" (Y), or "longitude" (X) then we recommend, but do not require (see Section 1.4, "Relationship to the COARDS Conventions" ), those dimensions to appear in the relative order T, then Z, then Y, then X in the CDL definition corresponding to the file. All other dimensions should, whenever possible, be placed to the left of the spatiotemporal dimensions.
The check for this is done in cf.CFBaseCheck._dims_in_order
and looks as follows:
def _dims_in_order(self, dimension_order):
"""
:param list dimension_order: A list of axes
:rtype: bool
:return: Returns True if the dimensions are in order U*, T, Z, Y, X,
False otherwise
"""
regx = regex.compile(r"^[^TZYX]*T?Z?Y?X?$")
dimension_string = "".join(dimension_order)
return regx.match(dimension_string) is not None
Issue
If the user gets
VARIBALE's dimensions are not in the recommended order T, Z, Y, X. They are DIMENSIONs
it it not obvious whether
- one or more of the dimensions T, Z, Y and/or X were not properly recognized,
- an additional dimension is wrongly located, or
- the dimensions T, Z, Y and/or X are actually wrongly ordered.
Solution
Extend the user output. Although quite long, I would suggest the following text:
VARIBALE's spatio-temporal dimensions are not in the recommended order T, Z, Y, X and/or further dimensions are not located left of T, Z, Y, X. The dims are DIMENSIONs. Their guessed types are DIMENSION_TYPES (U: other/unknown; L: unlimited).
When there is agreement reached, I will be happy to create a PR.
Outlook / Addtional Question
Considering unlimited dimensions as L-type and not as T-, Z-, Y- or X-type (if they were of such a type) might cause issues. E.g. if the Y dimension was unlimited, we would get a warning in the current situation.
A reply to my issue on order of dimensions in timeSeries datasets in the CF Discussion Repo suggest that the requirement for unlimited dimensions to be located on the left (CF Section 9.3), might be related to the technical limitation of netCDF3 (one unlimited dim; unlimited dim most left):
If there is a need for either the instance or an element dimension to be the netCDF unlimited dimension (so that more features or more elements can be appended), then that dimension must be the outer dimension of the data variable i.e. the leading dimension in CDL.
This means that the identification of unlimited dimensions in the IOOS CC CF Plugin could drop. Similarly the dimension type L
could be dropped, which would simplify checking the order of dimensions.
Thanks for bringing this up @neumannd. I have always thought the messages for dimension order were lacking, but not enough attention has been directed at them as of late to garner any improvements. I'm in favor of your proposal to add more context.
Regarding your additional question, are you referring to dropping
compliance-checker/compliance_checker/cf/cf.py
Lines 837 to 838 in 58262b2
from the Checker?
Thanks for bringing this up @neumannd. I have always thought the messages for dimension order were lacking, but not enough attention has been directed at them as of late to garner any improvements. I'm in favor of your proposal to add more context.
OK. Great. I will create a PR for this later on.
Regarding your additional question, are you referring to dropping [...] from the Checker? (lines 837/838 in cf/cf.py,
if ds.dimensions[coord_name].isunlimited(): coord_axis_map[coord_name] = "L"
)
Yes. I am not sure whether this status L
is needed for other checks. I didn't find any (looked via grep "\"L\""
). Therefore, I think that there is no benefit in knowing which dimension is unlimited. Also no unit test seems to depend on it. Should I open an extra issue for this?
Therefore, I think that there is no benefit in knowing which dimension is unlimited. Also no unit test seems to depend on it. Should I open an extra issue for this?
Yes, I think this is a valid discussion we should have. Looking at the CF-1.7 specification, the only mentions of the unlimited
dimension occur when referring to appending to an existing file. No restrictions on when to use the unlimited
dimension are given, so I think it's worth discussing 👍
I am adding this to the 4.3.4 milestone in the context that it is separate from #838, which will need a lengthier discussion.