Explicitly disallow CQL2 keywords as custom function names in CQL2 JSON Schema
aznan2 opened this issue · 3 comments
With the release of RC1, I took a look at the CQL2 JSON Schema and noticed that validating filters got me unexpected results.
First off, my validator would complain that the schema uses $dynamicAnchor
and $dynamicRef
which were intoduced in draft 2020-12, so the value of $schema
should be https://json-schema.org/draft/2020-12/schema
. And following that, additionalItems
was removed in draft 2020-12, so line 224 in isBetweenOperands
should be removed.
After having made those changes it only got worse. Consider two basic equality filters, the first one is correct and the second is incorrect:
{ "op": "=", "args": [ "a", "b" ] }
{ "op": "=", "args": [ "a", "b", "c" ] }
However, when validating against the current schema, the first instance fails validation and the second one passes. This is because in functionRef
in the schema, the op
property can be any string, even =
. The root of the schema has a oneOf
that contains comparisonPredicate
and functionRef
. Since the first instance is both a valid comparisonPredicate
and a valid function named =
, the oneOf
fails. Similarly, the second instance passes the validation, again because it is a valid function named =
.
7.11. Requirements Class "Functions", Requirement 18-A:
The server SHALL support custom functions as defined by the BNF rules function in CQL2 BNF ...
^ BTW, Example 24 still uses the old function syntax for JSON.
BNF, function:
function = identifier "(" {argumentList} ")";
6.4. Identifiers
An identifier is a token that represents a resource or a named part of a resource within a CQL2 expression that is not a CQL2 keyword or command. ... Valid starting characters for identifiers can include the colon (i.e. ":"), the underscore (i.e. "_") and letters of the alphabet (e.g. "A-Z, a-z").
From the above trail, I can deduce that =
is not a valid function name, but the schema validator doesn't know that. For the validation to work the op
property of functionRef
needs to be defined in a way so that it doesn't clash with any predefined keyword. Possibly something like this:
"op": {
"type": "string",
"pattern": "^[:_A-Za-zÀ-ÖØ-öø-ÿͰ-ͽ\u037F-\u1FFE\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD][:_A-Za-zÀ-ÖØ-öø-ÿͰ-ͽ\u037F-\u1FFE\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD.0-9\u0300-\u036F\u203F-\u2040]*$",
"not": {
"enum": [
"and", "or", "not", "like", "casei", "accenti", "between", "in", "isNull", "s_contains", "s_crosses", "s_disjoint", "s_equals", "s_intersects", "s_overlaps", "s_touches", "s_within", "t_after", "t_before", "t_contains", "t_disjoint", "t_during", "t_equals", "t_finishedBy", "t_finishes", "t_intersects", "t_meets", "t_metBy", "t_overlappedBy", "t_overlaps", "t_startedBy", "t_starts", "a_containedBy", "a_contains", "a_equals", "a_overlaps", "div"
]
}
}
I couldn't figure out how to include the last group, "\x10000".."\xEFFFF"
, in the regex. But the pattern could be skipped entirely if "=", "<>", "<", ">", "<=", ">=", "+", "-", "*", "/", "^", "%"
would be included in the enum.
Oh, I missed that issue. Everything works fine now. The only housekeeping thing one could do is remove scalarLiteral
, since that is not referenced anymore.