ioos/compliance-checker

CF Checker Plugin: extend notification if check for order of dimension fails

Closed this issue · 4 comments

Current situation

Dimensions of data variables are recommended to be ordered as follows:

  • spatio-temporal dimensions (e.g. X, Y, Z and T) should be in the order: TZYX
  • additional dimensions should be located to the left of the spatio-temporal dimensions

This is recommended in section 2.4 of the CF Conventions:

If any or all of the dimensions of a variable have the interpretations of "date or time" (T), "height or depth" (Z), "latitude" (Y), or "longitude" (X) then we recommend, but do not require (see Section 1.4, "Relationship to the COARDS Conventions" ), those dimensions to appear in the relative order T, then Z, then Y, then X in the CDL definition corresponding to the file. All other dimensions should, whenever possible, be placed to the left of the spatiotemporal dimensions.

The check for this is done in cf.CFBaseCheck._dims_in_order and looks as follows:

    def _dims_in_order(self, dimension_order):
        """
        :param list dimension_order: A list of axes
        :rtype: bool
        :return: Returns True if the dimensions are in order U*, T, Z, Y, X,
                 False otherwise
        """

        regx = regex.compile(r"^[^TZYX]*T?Z?Y?X?$")
        dimension_string = "".join(dimension_order)
        return regx.match(dimension_string) is not None

Issue

If the user gets

VARIBALE's dimensions are not in the recommended order T, Z, Y, X. They are DIMENSIONs

it it not obvious whether

  1. one or more of the dimensions T, Z, Y and/or X were not properly recognized,
  2. an additional dimension is wrongly located, or
  3. the dimensions T, Z, Y and/or X are actually wrongly ordered.

Solution

Extend the user output. Although quite long, I would suggest the following text:

VARIBALE's spatio-temporal dimensions are not in the recommended order T, Z, Y, X and/or further dimensions are not located left of T, Z, Y, X.  The dims are DIMENSIONs. Their guessed types are DIMENSION_TYPES (U: other/unknown; L: unlimited).

When there is agreement reached, I will be happy to create a PR.

Outlook / Addtional Question

Considering unlimited dimensions as L-type and not as T-, Z-, Y- or X-type (if they were of such a type) might cause issues. E.g. if the Y dimension was unlimited, we would get a warning in the current situation.

A reply to my issue on order of dimensions in timeSeries datasets in the CF Discussion Repo suggest that the requirement for unlimited dimensions to be located on the left (CF Section 9.3), might be related to the technical limitation of netCDF3 (one unlimited dim; unlimited dim most left):

If there is a need for either the instance or an element dimension to be the netCDF unlimited dimension (so that more features or more elements can be appended), then that dimension must be the outer dimension of the data variable i.e. the leading dimension in CDL.

This means that the identification of unlimited dimensions in the IOOS CC CF Plugin could drop. Similarly the dimension type L could be dropped, which would simplify checking the order of dimensions.

Thanks for bringing this up @neumannd. I have always thought the messages for dimension order were lacking, but not enough attention has been directed at them as of late to garner any improvements. I'm in favor of your proposal to add more context.

Regarding your additional question, are you referring to dropping

if ds.dimensions[coord_name].isunlimited():
coord_axis_map[coord_name] = "L"

from the Checker?

Thanks for bringing this up @neumannd. I have always thought the messages for dimension order were lacking, but not enough attention has been directed at them as of late to garner any improvements. I'm in favor of your proposal to add more context.

OK. Great. I will create a PR for this later on.

Regarding your additional question, are you referring to dropping [...] from the Checker? (lines 837/838 in cf/cf.py, if ds.dimensions[coord_name].isunlimited(): coord_axis_map[coord_name] = "L")

Yes. I am not sure whether this status L is needed for other checks. I didn't find any (looked via grep "\"L\""). Therefore, I think that there is no benefit in knowing which dimension is unlimited. Also no unit test seems to depend on it. Should I open an extra issue for this?

Therefore, I think that there is no benefit in knowing which dimension is unlimited. Also no unit test seems to depend on it. Should I open an extra issue for this?

Yes, I think this is a valid discussion we should have. Looking at the CF-1.7 specification, the only mentions of the unlimited dimension occur when referring to appending to an existing file. No restrictions on when to use the unlimited dimension are given, so I think it's worth discussing 👍

I am adding this to the 4.3.4 milestone in the context that it is separate from #838, which will need a lengthier discussion.