Turn off only coordinate renaming in preprocessor

Question

Turn off only coordinate renaming in preprocessor

Closed this issue 2 years ago · 6 comments

Ocean model data is often simultaneously analyzed on multiple grids. In the case of MOM6/OM4, this included tracer quantities at the cell centers and transport quantities at the cell edges and corners. Thus, the notion of a single latitude / longitude coordinate system is ill-posed* (see additional context). A flag to preprocess input variables but skip the coordinate renaming step is needed.

The proposed solution is to add an attribute to the data/fieldlist_*.jsonc entries to turn off this feature, e.g.:

    "vmo": {
      "standard_name": "ocean_mass_y_transport",
      "units": "kg s-1",
      "rename_coords": false
    },

The default behavior would still be to rename the coordinates (i.e. "rename_coords": true) if no specification is given.

Additional context

This is in support of running OM4Labs from the MDTF
Using xgcm would be a possible consideration in future versions of the preprocessor

Answer 1 · 2022-06-13T14:22:06.000Z

@wrongkindofdoctor, @aradhakrishnanGFDL, @Wen-hao-Dong --- any comments or concerns on this proposal?

Answer 2 · 2022-06-13T15:29:55.000Z

@jkrasting The proposed rename_coords flag sounds like a reasonable solution to handling the ocean data at this time

Answer 3 · 2022-06-13T22:22:46.000Z

@jkrasting can you provide an example of the coordinate variable renaming in the case that it needs to be skipped with the actual names, assuming it's based on this?

To a POD developer, what is the take away/guidance for this feature and reference?

Answer 4 · 2022-06-14T00:45:26.000Z

Sure thing @aradhakrishnanGFDL.

Consider a simple POD that analyzes thetao and vmo in native GFDL pp formats. These variables are defined on two different grids. The thetao variable is defined at the cell centers (yh,xh) while vmo is defined at cell's northern face (yq,xh)

float thetao(time, z_l, yh, xh) ;
		thetao:long_name = "Sea Water Potential Temperature" ;
		thetao:units = "degC" ;
		thetao:missing_value = 1.e+20f ;
		thetao:_FillValue = 1.e+20f ;
		thetao:cell_measures = "volume: volcello" ;
		thetao:standard_name = "sea_water_potential_temperature" ;
		thetao:cell_methods = "area:mean z_l:mean yh:mean xh:mean time: mean" ;
		thetao:time_avg_info = "average_T1,average_T2,average_DT" ;

float vmo(time, z_l, yq, xh) ;
		vmo:long_name = "Ocean Mass Y Transport" ;
		vmo:units = "kg s-1" ;
		vmo:missing_value = 1.e+20f ;
		vmo:_FillValue = 1.e+20f ;
		vmo:standard_name = "ocean_mass_y_transport" ;
		vmo:cell_methods = "z_l:sum yq:point xh:sum time: mean" ;
		vmo:time_avg_info = "average_T1,average_T2,average_DT" ;)

The MDTF preprocessor is not flexible enough to fully support multiple grids, so it tries by default to infer a single grid and assign that same grid to every variable. It also renames the coordinate in the process. If you look at yh and yq, both variables are considered latitude by MDTF's inference rules:

	double yh(yh) ;
		yh:long_name = "h point nominal latitude" ;
		yh:units = "degrees_north" ;
		yh:axis = "Y" ;
	double yq(yq) ;
		yq:long_name = "q point nominal latitude" ;
		yq:units = "degrees_north" ;
		yq:axis = "Y" ;

If a POD attempts to use both these variables simultaneously, the framework will preprocess yh and yq and name them both latitude. This ends up leading to KeyErrors when the latitude is already defined by one version, say yh, and then is attempted to be overwritten by yq. It can also lead to one of the variables mistakenly being interpreted on the wrong grid.

A flag already exists to turn off the strict enforcement of one grid and it allows for multiple coordinates to coexist, but there is not a flag to turn off the coordinate renaming. This was probably intended but not fully implemented.

This change mainly impacts how a data center defines their variable conventions (i.e. through the fieldlist*.jsonc files). In the case of GFDL, this is needed for our post-processed ocean data since we do not include the two-dimensional coordinates (geolat/geolon) with each variable for space reasons. Other CF-compliant data (e.g. most CMIP output) poses no issue as the 2-dimensional coords are repeated in every file and have unique names for the different grids (geolat_v,geolon_v)

No real change from the POD developer's perspective. It's mainly how the framework interfaces with a source dataset such as GFDL's pp format.

Answer 5 · 2022-06-15T18:54:01.000Z

Makes sense, John. No issues. Only minor comment- since time, etc are also coordinate variables, the framework needs to know precisely that this issue/feature request pertains to grid coordinates.

Answer 6 · 2022-06-15T23:19:42.000Z

Good point, @aradhakrishnanGFDL. I took this into account via 104ace7. The old renaming rules will still always apply to any diagnostic.VarlistTimeCoordinate instance.