admin-shell-io/aas-specs

Relaxation of idShort restrictions (Constraint AASd-002)

sebastiankb opened this issue · 12 comments

What

idShort is currently very restricted defined in the specification:

idShort of Referables shall only feature letters, digits,
underscore (""); starting mandatory with a letter, i.e. [a-zA-Z][a-zA-Z0-9]*.

idShorts like "min-temperature-value" or "modbus:function" are not allowed.

idShorts have a variable-like character. However, I cannot follow why only "_" is allowed as a special character. This complicates the reusage of existing names (that are also variable-like characters), e.g., from standards that uses prefixes.

I tried to understand, where the restrictions come from and I received two resealable answers so far.:

  1. "should not cause conflicts with the idShort path approach of the REST interface."
  2. "variables in programming languages also do not support special characters"

Mitigation

Point 1 is not really a justification for this restriction, URL path allows many special characters such as “-”, “$”, “:” etc
Point 2 is more justifiable. However, it is hard to generalize this, since some programming languages allow “$” in variable names. But seriously questions: Why should idShort names identical be reflected as variable names in a programming language? Which tool/lib is doing this? AAS comes with its own serialization approaches. It makes more sense to reflect the serialization model in a programmatically manner (if needed).

Proposal

It makes sense, that idShorts have a variable-like character, however, more flexibility of the idShort values would be desirable, e.g., to adhere existing name conventions that uses also “-” and “:” in the name value.

Proposal 1: Allow more special characters such as “-” and “:”

and/or

Proposal 2: Allow “%” --> allows URL encoding (all special characters can be reflected)

Note: Proposal 1 and 2 are backward compatible!

There is a third reason:

  1. "should not cause conflicts with the idShort path approach of the REST interface."
  2. "variables in programming languages also do not support special characters"
  3. the Value-Only approach of the http/REST API is based on the idShort-Names, so only names valid in JSON should be allowed

We additionally do have a display name (in different languages even) for more elaborate names

Mitigation to the third reason: JSON key names are very flexible, e.g. spaces are also permitted ":", "-" etc.. Also see rfc8259

We additionally do have a display name (in different languages even) for more elaborate names

I know, but it's not the same, as display names are mainly used for (human-readable) UI purposes. This is more about keeping the naming convention for idShorts, especially for terms/variables that already exist, e.g. from standards. Many RFC and W3C standards use variable names which include "-" and ":" characters. In terms of interoperability and understandability, it would be nice if established names could be used as is.

Discussion in Workstream AAS on 2023-11-23

  • impact on existing implementation might be huge, because they rely on restrictive definition of idShort
  • especially special characters like % and : migth lead to problems, not all standard proxies might be able to deal with it
  • more difficult to implement because all cases need to be considered

Proposal:

allow "-"

Wish for 4.0:
idShort of Referables shall only feature letters, digits, hyphen ("-") and
underscore ("_"); starting mandatory with a letter and not ending with a hyphen or underscore, i.e. a-zA-Z? .

Decision:
Due to backward compatiblity we need to allow underscore at the end of the idShort:

idShort of Referables shall only feature letters, digits, hyphen ("-") and
underscore ("_"); starting mandatory with a letter and not ending with a hyphen, i.e. a-zA-Z? .

To be checked: regular expression with "-": correct like this?

this should be the right regular expression:

^[a-zA-Z][a-zA-Z0-9_-]*[a-zA-Z0-9_]+$

this should be the right regular expression:

^[a-zA-Z][a-zA-Z0-9_-]*[a-zA-Z0-9_]+$

I think the regular expression is not backward compatible because it requests at least two characters (we relaxed this constraint with V3.0RC02)
This should be correct:

^[a-zA-Z] ([a-zA-Z0-9_-][a-zA-Z0-9_]+ | [a-zA-Z0-9_] ) $

Another issue: in Annex C Backus-Naur-Form we do not explain the characters ^ and $. What do they mean?
^ means beginning of line
$ means "end of line"

see decision #295 (comment)

#Constraint AASd-002:# _idShort_ of __Referable__s shall only feature letters, digits, hyphen ("-") and underscore ("_"); starting mandatory with a letter, and not ending with a hyphen, i.e. ^[a-zA-Z] ([a-zA-Z0-9_-][a-zA-Z0-9_]+ | [a-zA-Z0-9_] ) $.]

For SMT submodel elements (see https://industrialdigitaltwin.org/en/content-hub/create-a-submodel) also other special characters like "{000}" are used.

Alternatives:
a) make three constraints, one for submodel elements with a Submodel instance and one for submodel elements within a Submodel template and one for elements not being a submodel element but referable
b) only one constraint for submodel instances (the existing AASd-002). This means everything allowed in SMT
c) extend existing constraint AASd-002

For a) and c) the question is how strict to make it: just
#Constraint AASd-00x:# _idShort_ of __SubmodelElement__s within a Submodel template (Submodel/kind = Template) shall only feature letters, digits, hyphen ("-") and underscore ("_"); starting mandatory with a letter, and not ending with a hyphen. Additionally for wildcards also {00} or {000} is allowed to be used. i.e. ^[a-zA-Z] ([a-zA-Z0-9_-][a-zA-Z0-9_]+ | [a-zA-Z0-9_] ) < { 0[0]+ }$.

this should be the right regular expression:
^[a-zA-Z][a-zA-Z0-9_-]*[a-zA-Z0-9_]+$

I think the regular expression is not backward compatible because it requests at least two characters (we relaxed this constraint with V3.0RC02) This should be correct:

^[a-zA-Z] ([a-zA-Z0-9_-][a-zA-Z0-9]+ | [a-zA-Z0-9_]_ ) $

The pattern I have proposed takes this restriction into account with at least two characters. You can test this with this tool: https://regex101.com/

One character --> no match

image

Two character --> match

image

For SMT submodel elements (see https://industrialdigitaltwin.org/en/content-hub/create-a-submodel) also other special characters like "{000}" are used.

Alternatives: a) make three constraints, one for submodel elements with a Submodel instance and one for submodel elements within a Submodel template and one for elements not being a submodel element but referable b) only one constraint for submodel instances (the existing AASd-002). This means everything allowed in SMT c) extend existing constraint AASd-002

A good catch. I would prefer to make an exception for templates. On the other hand, it becomes inconsistent if we have different variants of constraints depending on the AAS form (template/instance/type). We should discuss this in the next meeting.

into account with at least two characters

This is exactly NOT valid, 1-letter idShorts are valid as well

Sorry, I misunderstood. However, your proposed reg expression above do not allow 1-letter idShort either.

This version should work:

^[a-zA-Z]([a-zA-Z0-9_-]*[a-zA-Z0-9_]+)?$

the regex is: ^[a-zA-Z][a-zA-Z0-9_-]*[a-zA-Z0-9_]+$