Relaxation of idShort restrictions (Constraint AASd-002)
sebastiankb opened this issue · 12 comments
What
idShort is currently very restricted defined in the specification:
idShort of Referables shall only feature letters, digits,
underscore (""); starting mandatory with a letter, i.e. [a-zA-Z][a-zA-Z0-9]*.
idShorts like "min-temperature-value" or "modbus:function" are not allowed.
idShorts have a variable-like character. However, I cannot follow why only "_" is allowed as a special character. This complicates the reusage of existing names (that are also variable-like characters), e.g., from standards that uses prefixes.
I tried to understand, where the restrictions come from and I received two resealable answers so far.:
- "should not cause conflicts with the idShort path approach of the REST interface."
- "variables in programming languages also do not support special characters"
Mitigation
Point 1 is not really a justification for this restriction, URL path allows many special characters such as “-”, “$”, “:” etc
Point 2 is more justifiable. However, it is hard to generalize this, since some programming languages allow “$” in variable names. But seriously questions: Why should idShort names identical be reflected as variable names in a programming language? Which tool/lib is doing this? AAS comes with its own serialization approaches. It makes more sense to reflect the serialization model in a programmatically manner (if needed).
Proposal
It makes sense, that idShorts have a variable-like character, however, more flexibility of the idShort values would be desirable, e.g., to adhere existing name conventions that uses also “-” and “:” in the name value.
Proposal 1: Allow more special characters such as “-” and “:”
and/or
Proposal 2: Allow “%” --> allows URL encoding (all special characters can be reflected)
Note: Proposal 1 and 2 are backward compatible!
There is a third reason:
- "should not cause conflicts with the idShort path approach of the REST interface."
- "variables in programming languages also do not support special characters"
- the Value-Only approach of the http/REST API is based on the idShort-Names, so only names valid in JSON should be allowed
We additionally do have a display name (in different languages even) for more elaborate names
Mitigation to the third reason: JSON key names are very flexible, e.g. spaces are also permitted ":", "-" etc.. Also see rfc8259
We additionally do have a display name (in different languages even) for more elaborate names
I know, but it's not the same, as display names are mainly used for (human-readable) UI purposes. This is more about keeping the naming convention for idShorts, especially for terms/variables that already exist, e.g. from standards. Many RFC and W3C standards use variable names which include "-" and ":" characters. In terms of interoperability and understandability, it would be nice if established names could be used as is.
Discussion in Workstream AAS on 2023-11-23
- impact on existing implementation might be huge, because they rely on restrictive definition of idShort
- especially special characters like % and : migth lead to problems, not all standard proxies might be able to deal with it
- more difficult to implement because all cases need to be considered
Proposal:
allow "-"
Wish for 4.0:
idShort of Referables shall only feature letters, digits, hyphen ("-") and
underscore ("_"); starting mandatory with a letter and not ending with a hyphen or underscore, i.e. a-zA-Z? .
Decision:
Due to backward compatiblity we need to allow underscore at the end of the idShort:
idShort of Referables shall only feature letters, digits, hyphen ("-") and
underscore ("_"); starting mandatory with a letter and not ending with a hyphen, i.e. a-zA-Z? .
To be checked: regular expression with "-": correct like this?
this should be the right regular expression:
^[a-zA-Z][a-zA-Z0-9_-]*[a-zA-Z0-9_]+$
this should be the right regular expression:
^[a-zA-Z][a-zA-Z0-9_-]*[a-zA-Z0-9_]+$
I think the regular expression is not backward compatible because it requests at least two characters (we relaxed this constraint with V3.0RC02)
This should be correct:
^[a-zA-Z] ([a-zA-Z0-9_-][a-zA-Z0-9_]+ | [a-zA-Z0-9_] ) $
Another issue: in Annex C Backus-Naur-Form we do not explain the characters ^ and $. What do they mean?
^ means beginning of line
$ means "end of line"
see decision #295 (comment)
#Constraint AASd-002:# _idShort_ of __Referable__s shall only feature letters, digits, hyphen ("-") and underscore ("_"); starting mandatory with a letter, and not ending with a hyphen, i.e. ^[a-zA-Z] ([a-zA-Z0-9_-][a-zA-Z0-9_]+ | [a-zA-Z0-9_] ) $.]
For SMT submodel elements (see https://industrialdigitaltwin.org/en/content-hub/create-a-submodel) also other special characters like "{000}" are used.
Alternatives:
a) make three constraints, one for submodel elements with a Submodel instance and one for submodel elements within a Submodel template and one for elements not being a submodel element but referable
b) only one constraint for submodel instances (the existing AASd-002). This means everything allowed in SMT
c) extend existing constraint AASd-002
For a) and c) the question is how strict to make it: just
#Constraint AASd-00x:# _idShort_ of __SubmodelElement__s within a Submodel template (Submodel/kind = Template) shall only feature letters, digits, hyphen ("-") and underscore ("_"); starting mandatory with a letter, and not ending with a hyphen. Additionally for wildcards also {00} or {000} is allowed to be used. i.e. ^[a-zA-Z] ([a-zA-Z0-9_-][a-zA-Z0-9_]+ | [a-zA-Z0-9_] ) < { 0[0]+ }$.
this should be the right regular expression:
^[a-zA-Z][a-zA-Z0-9_-]*[a-zA-Z0-9_]+$
I think the regular expression is not backward compatible because it requests at least two characters (we relaxed this constraint with V3.0RC02) This should be correct:
^[a-zA-Z] ([a-zA-Z0-9_-][a-zA-Z0-9]+ | [a-zA-Z0-9_]_ ) $
The pattern I have proposed takes this restriction into account with at least two characters. You can test this with this tool: https://regex101.com/
One character --> no match
Two character --> match
For SMT submodel elements (see https://industrialdigitaltwin.org/en/content-hub/create-a-submodel) also other special characters like "{000}" are used.
Alternatives: a) make three constraints, one for submodel elements with a Submodel instance and one for submodel elements within a Submodel template and one for elements not being a submodel element but referable b) only one constraint for submodel instances (the existing AASd-002). This means everything allowed in SMT c) extend existing constraint AASd-002
A good catch. I would prefer to make an exception for templates. On the other hand, it becomes inconsistent if we have different variants of constraints depending on the AAS form (template/instance/type). We should discuss this in the next meeting.
into account with at least two characters
This is exactly NOT valid, 1-letter idShorts are valid as well
Sorry, I misunderstood. However, your proposed reg expression above do not allow 1-letter idShort either.
This version should work:
^[a-zA-Z]([a-zA-Z0-9_-]*[a-zA-Z0-9_]+)?$
the regex is: ^[a-zA-Z][a-zA-Z0-9_-]*[a-zA-Z0-9_]+$