Regex in CMIP6_CV.json to test `*_index` attributes
neumannd opened this issue · 1 comments
The CMIP6_CV.json
contains regular expressions to test the global attributes physics_index
, initialization_index
, forcing_index
and realization_index
for correctness. These global attributes should be integers (CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s). Therefore, the CMOR PrePARE.py script) just checks the type of these attributes and does not use the regex of CMIP6_CV.json
.
However, the regular expression provided in CMIP6_CV.json
seems to check for an arbitrary number of [
in front and ]
behind the integer. I don't understand, why this is done. This seems to contradict CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s.
evaluation of the regular expression
In the CMIP6_CV.json
the regex for testing the *_index
attributes is written as:
^\\[\\{0,\\}[[:digit:]]\\{1,\\}\\]\\{0,\\}$
The first \
of each \\
escapes the second \
. That's clear. Without escapes we have
^\[\{0,\}[[:digit:]]\{1,\}\]\{0,\}$
I assume that we have a POSIX Basic Regular Expression. That means that \[
and \]
are taken literally. \{n,\}
are intepreted as: "the sign/character/number left of this expression may appear n
to infinite times". The ^
and $
are the beginning and end of a line, respectively. Thus, we have
^ : beginning of the line
\[\{0,\} : `[` appears zero to infinite times
[[:digit:]]\{1,\} : a digit between `0` and `9` appears one to infinite times
\]\{0,\} : `]` appears zero to infinite times
$ : end of the line
These values would be captured by the regular expression:
1
123
42
53253262
But also these values would be captured by the regular expression:
[1435]
[[123]]
[[123]
[123]]
[123]]]]]]]]]
I would have expected this regular expression
^[[:digit:]]\\{1,\\}$
or
^[[0-9]]\\{1,\\}$
^[[:digit:]]+$
^[[0-9]]+$
Or is this something that should be mentioned in https://github.com/PCMDI/cmor/issues/256?