gs1/gs1-syntax-dictionary

Remove use of '|' character in dlpkey

KDean-GS1 opened this issue · 7 comments

The presence of '|' in dlpkey is redundant, as the same restrictions can be discovered by looking at the "ex=" entries. Having the same restrictions in two places leads to the possibility of bugs (one of which is raised in a separate issue).

ex and req restrictions are "subtractive" in the sense that you start from the position of assuming the user-provided AI data (e.g. an AI element string) is valid, and then apply the codified "data relationships" from the Invalid Pairs and Mandatory Associations tables to determine whether or not any of the rules are violated, making the given element string is invalid.

Since these data relationship rules were defined after the fact (based on field experience with meaningless data) they are necessarily incomplete with "only situations that have proven to pose difficulties in practice are included." (§4.14.1 para. 1). So it boils down to "best efforts".

However, the situation is much improved with GS1 DL URI path components because the set of valid qualified keys is fully specified in the grammar and can therefore be comprehensively reproduced by the dlpkey entries in the Syntax Dictionary.

So in the case of validation a GS1 DL URI path component representing a qualified key, the restrictions are "additive" in the sense that you start from the position of nothing being valid unless it is a direct match against on of the dlpkey rules, which prescribe precisely which path component AI sequences are valid. After you have validated that the path component, you can proceed with AI extraction (from both the path component and query parameters) and the apply the ex and req rules to the extracted set of AIs.

So the meaning and point of processing for the dlpkey and ex rules are somewhat distinct.

(In case it isn't clear from the description, in which case I can add more supporting text, in an entry such as dlpkey=22,10,21|235 the "," binds tighter than the "|", i.e. it represents a choice between alternative sequences: 21,10,21 versus 235; and not 22,10,21 versus 21,10,235, so there isn't a parity between "|" and "ex".)

I understand the binding strength, but it seems regardless of which direction you go (subtractive vs. additive), you end up with the same set of rules. My (very low) concerns are of maintenance and duplication of code.

On the maintenance side, we have to be sure that any AI rule we create is properly reflected in the DL rules if it affects a DL composite key. Low concern, of course, because the AI table doesn't change very often, and the addition of an AI that would result in a DL composite key is the kind of thing that happens about as often as a GS1 user understanding the differences between GTIN-8, GTIN-12, GTIN-13, and GTIN-14 on the first reading.

The duplication of code means that we're applying two sets of validations: one to the AI collection, whether from an AI string or a DLURI, and one to the DLURI alone. I'm a fan of reducing code as much as possible. If we end up where we have slightly different rules for AI strings vs. DLURI, then we have a situation where we can create a barcode that has no DLURI representation or vice versa. Add EPC tags into the mix and it can get worse. Again, very low concern because I can't really picture it happening in the real world, but it's the kind of thing that bugs me. :-)

Ultimately, this is likely something that should go through the ID SMG so that we can tighten up the GenSpecs and ensure alignment between that and the DLURI syntax standard.

I'll ask Mark to weigh in as well.

I am in agreement with the principles of "say it once" and reduce code footprint.

I can confirm that in the Syntax Engine (a reference implementation for how to apply the Syntax Dictionary to parse, transform and validate AI data) we have code to verify the path component according to dlpkey rules derived from the DLURI grammar, whose sphere of influence is restricted to GS1 DL URIs. The path component validation is performed at parse time, without backtracking, on the basis that if the URI does not meet the requirements of the grammar then it is not a DLURI (rather than being a DLURI with bad AI data, as would be the case if the attribute data in the query parameters failed the general AI rules).

We then have code to verify the AIs themselves, as well as the relationships between the AIs. There's not much overlap in these actions. In the GenSpecs the AI data relationship rules apply universally.

So I think that the sets of checks represent the current reality, that there are strict rules for what is a DL URI qualified-key and looser rules for what is a valid AI element string (*).

I'm not sure that the concept of a valid qualified-key has a concrete implementation in the world of AI element strings (etc.) outside of a DLURI path component - at least not to the point where you can confidently reject data. (I'm not familiar enough with EPC to comment.) It would be very nice if we could replace the existing data relationship tables with rules that could be more readily codified and have rigour approaching the DLURI grammar. The existing tables actually require you to read the descriptive text for each entry to work out how they apply.

If the GenSpecs are ultimately firmed up for generic AI data then the Syntax Dictionary would of course realign. Even then, it would not reduce that code footprint (already tight) since you still need a special function to parse DL URIs - just that it might draw on non-DLURI specific rules.

My main concern is that the Syntax Dictionary artifact itself remains simple to parse (humans and machines) and that it is easy to write code that uses it to process AI data, since it is this that is driving adoption. Some people may link the Syntax Engine directly into their projects, others may study the Syntax Engine as a reference, and most will probably ignore it in practise unless they get stuck... But we nevertheless want everybody who can benefit from it to use the Syntax Dictionary and get their implementations mostly right from day one.

(*) A side note that's worth considering is that the data relationship rules don't actually map directly to AI element strings. For example, in the case of multi-barcode shipping labels the exclusive and mandatory pairings cover the entirety of the AI data on the trade item - so the set of AI reads must be aggregated prior to applying the rules.

Regarding EPCs, every EPC is an instance identifier. In some situations, they correspond to a single GS1 identification key that is already an instance identifier (e.g. an SSCC). In other situations, they correspond to a GS1 identification key that has an optional serial component within it, which must be expressed (e.g. a GRAI with its serial component expressed). In other situations, they correspond to a compound key formed from a class-level primary key (e.g. GTIN) and the AI qualifier that in combination with the primary identifier identifies exactly one thing (e.g. (21) Serial Number or (235) TPX for GTIN, but not (22) or (10), because multiple distinct individuals can share the same compound key formed from (01)+(22) or (01)+(10) while they cannot share the same (01)+(21) or (01)+(235). All valid EPC identifiers are specified within the GS1 Tag Data Standard, which is currently in the process of being updated to v2.0.

GS1 Digital Link URIs make a distinction between primary identifiers, qualifiers vs 'data attribute' AIs that has not previously been expressed in such a way in the GS1 General Specifications.
The nearest thing we had to that before GS1 Digital Link URI Syntax was a table in the GS1 System Architecture that lists a number of simple keys and compound keys.

On re-reading this issue (and the DLURI syntax rules), I find that I indeed got the binding strength mixed up. The prohibition of using AI 235 in any DLURI path that includes 22, 10, or 21 is of course stronger than the AI rules, so it's possible to create an AI string that can't be expressed as a DLURI, but I don't see it happening in practice.

Mark, I'm wondering now if we shouldn't add EPCs to the syntax dictionary somehow.

I'll leave this issue open for further comments, but I'll close it as "not planned" (due to misunderstanding) once the two of you have had an opportunity to weigh in with any final thoughts.

I think that mappings to EPCs could be useful, especially as the GS1 Digital Link URIs permitted within EPCIS as an alternative to pure identity EPC URNs are a constrained subset of GS1 Digital Link URIs, without query strings and with only the unique instance-level qualifier where appropriate, e.g. (01)+(21), (01)+(235), (414) or (414)+(254).

It would probably be helpful to at least have a table of EPC headers and which AIs they correspond to.

No objections to folding the TDT files into the project if additional details are needed.

No objections to folding the TDT files into the project if additional details are needed.

I would certainly be happy to fold in data relationships and syntax validations rules within EPC, especially if these can be expressed within the dictionary by adding additional key-value pairs to the AI entries.

At to the feasibility of this, I don't know, but will examine the relevant standards at a later stage.