cerfacs-globc/icclim

ENH: Add aliases to auto-detect variables suffixed with "Adjust"

bzah opened this issue · 1 comments

bzah commented
  • icclim version: 5.2
  • Python version: n/a

Description

We already have a feature to detect variables in the input netCDF. The detected names come from clix-meta standard-names table which was imported into icclim.
We could embellish it to accept prefixed or suffixed keywords.
For example, MeteoFrance DRIAS variables are often suffixed with "Adjust" keyword (it means a pre-processing on data occurred).

I see two approach possible.

  1. The lazy approach would be to only detect if a standard_name is in the dataset variables, instead of the equality done now.
    In that case "tasmaxAdjust" is properly identified as "tasmax" but, "not_a_tasmax" is identified as tasmax as well, and "tasmin_tasmax" also identify tasmax (if it tasmax is first to run in the detection loop).

  2. The keyword approach would be to have a list of valid keywords that could suffix or prefix variables.
    But it brings new question @pagecp:
    Taking "tasmax" as an example.

  • Should we accept any letter cases: "tasmaxAdjust" (DRIAS), "tasmaxADJUST", "tasmaxadjust" ?
  • Should we accept multiple separators such as "", "-" or "_": "tasmax-adjust", "tasmax_adjust" ?
  • Should we accept prefixes: "adjust_tasmax" ?
  • Are there other common keywords that we should consider ?
    In CF there are a few standard_name modifiers but I think they all modify the whole meaning of the variable. It makes them unusable for index computation.

The Adjust suffix comes from the CORDEX community and experiments so it is really not specific to DRIAS. It identifies variables that have been bias-corrected. This suffix is used for all CORDEX bias corrected datasets on ESGF.
So I suggest we stick with this specific suffix and do not generalize (at least for now) the auto detection to prevent some issues.