gs1/gs1-syntax-dictionary

Scanned barcode and brackets

Opened this issue · 6 comments

A comparison with, GS1 Syntax Engine GUI demo and my personal development to explain my problem.

1) With GS1 Syntax Engine GUI demo


Input data :

(01)00883873867792(22)AvBn220707(10)220707(21)PC22412085

Detected syntax:

Bracketed AI element string

Barcode message (^ = FNC1):

^010088387386779222AvBn220707^10220707^21PC22412085

GS1 AI element string:

(01)00883873867792(22)AvBn220707(10)220707(21)PC22412085

HRI test:

GTIN (01) 00883873867792
CPV (22) AvBn220707
BATCH/LOT (10) 220707
SERIAL (21) PC22412085

2) With my development:


Input data :

(01)00883873867792(22)AvBn220707(10)220707(21)PC22412085

GS1 AI element string:

(01)00883873867792(10)220707(22)AvBn220707(21)PC22412085

HRI test:

GTIN (01) 00883873867792
BATCH/LOT (10) 220707(22)AvBn220707
SERIAL (21) PC22412085

We can see that AI 22 is not recognized because the pattern [!%-?A-Z_a-z\x22] considers parentheses to be part of AI 10.
How do you handle this case?
Thanks for your help.

Your context isn't clear. In particular, what does this have to do with the Syntax Dictionary project?

It seems like you are reporting an issue with your own code by comparing its output to the results from the GS1 Syntax Engine (a separate project), but have not provided any code.

What is the nature of the help that you are looking for?

The regular expression pattern [!%-?A-Z_a-z\x22] does not appear in the GS1 Barcode Syntax Dictionary at all - see https://raw.githubusercontent.com/gs1/gs1-syntax-dictionary/2023-12-11/gs1-syntax-dictionary.txt

However, it does appear in the new browser for GS1 Application Identifiers in the details at https://ref.gs1.org/ai/22

Since parentheses are permitted in Figure 7.11-1 of https://ref.gs1.org/standards/genspecs/ , you raise an interesting question about how a batch/lot identifier of 220707(22)AvBn220707 could be distinguished from a batch/lot number of 220707 and consumer product variant of AvBn22070 when the value of the batch/lot identifier coincidentally includes left round bracket followed by 2-4 digits that correspond to a GS1 AI followed by a right round bracket when the element string is provided using parentheses around AI keys rather than without parentheses and using after each AI value that is not of defined length.

By writing (01)00883873867792(10)220707(22)AvBn220707(21)PC22412085 there is such an ambiguity
Using <FNC1> or <GS>, you can distinguish between:

010088387386779210220707(22)AvBn220707<FNC1>21PC22412085<FNC1>

for a batch/lot number (AI (10)) that happens to be 220707(22)AvBn220707 - and no CPV (AI (22)) expressed

versus

010088387386779210220707<FNC1>22AvBn220707<FNC1>21PC22412085<FNC1>

for a batch/lot number (AI (10)) of 220707 and separately a CPV (AI (22)) of AvBn220707

You may be highlighting a limitation or potential ambiguity in the use of parentheses to denote AI keys within element strings and a lack of guidance in the GS1 General Specifications about avoiding sequences of ( 2-4 digits ) within the values of batch/lot numbers, serial numbers, CPV etc.

I'm not aware of any unambiguous guidance for parsing ambiguous strings such as

(01)00883873867792(10)220707(22)AvBn220707(21)PC22412085

and within element strings, we don't have a way to escape a literal left/right round bracket.

If there is such guidance, I'm sure @terryburton can point us to it.

I suspect that the best we could do is to include an advisory note in a future version of GS1 Gen Specs regarding the potential pitfalls of including ( 2-4 digits ) within the structure of a batch/lot or serial number, though I can also understand that when such values are generated in a sparse non-sequential manner, this could arise - and in such cases, it might be necessary to add some filtering in the generation mechanism to reject that value and pick another instead.

Regarding the regular expression you originally asked about, you're probably aware that you can use a trailing ? to indicate non-greedy matching , which might partially help if the next character is required to match a left round bracket or end of string.

I'm not aware of any unambiguous guidance for parsing ambiguous strings such as
... snip ...
If there is such guidance, I'm sure @terryburton can point us to it.

Refer to the API docs for the GS1 Syntax Engine which denote that in-data "(" characters shall be escaped by means of a backslash within inputs to gs1_encoder_setAIdataStr():

Any "(" characters in AI element values must be escaped as "\‍(" to avoid conflating them with the start of the next AI.

Ref: https://gs1.github.io/gs1-syntax-engine/#abdce5a9347ff7cbbd153b1a13e3273dd

Thanks, @terryburton
That certainly makes sense for the Barcode Syntax Dictionary and its tool / engine.

I was actually referring to a lack of (my awareness of) such guidance within the GS1 Gen Specs when literal round brackets appear within batch/lot or serial numbers when element strings are expressed in round bracket notation for the AI keys.

I was thinking that an additional note in section 4.14 of GS1 Gen Specs after 2c in the section on Font and legibility (which mentions parentheses around AI keys) could be helpful.

I was actually referring to a lack of (my awareness of) such guidance within the GS1 Gen Specs when literal round brackets appear within batch/lot or serial numbers when element strings are expressed in round bracket notation for the AI keys.

I don't recall any such guidance. As far as systematic representation of bracketed elements strings is concerned (both for input and output), each software solution that I'm aware of beats its own path: The GS1 Syntax Engine requires backslash encoding; BWIPP / BWIP-JS require that in-data parenthesis are encoded via their ASCII ordinal (using the parse option); Zint resorts to square brackets for AIs, ...

One might argue that barcode message data (FNC1 in first format) is the appropriate syntax for data exchange, but bad barcode data arising from improper human and application encoding of this syntax is the number one issue I face on an almost weekly basis. Automatic encoding based on user-supplied bracketed syntax or AI builder workflows leads to much better data quality (with any half-decent encoder).

I was thinking that an additional note in section 4.14 of GS1 Gen Specs after 2c in the section on Font and legibility (which mentions parentheses around AI keys) could be helpful.

I agree. There's the issues of (1) clarity for packaging purposes and maybe also (2) providing an unambiguous syntax for communication purposes (data entry and output).

I'm not sure whether the GenSpecs address the latter since bracketed syntax isn't promoted for data exchange, however it would help applications that handle AI data for user inputs (e.g. label generators) to harmonise on a standard approach in the future.

Hi all, thank you for helpful feedback and discussion. Unsure if this can be addressed in the short term within the GenSpecs, but I have noted the topic to be considered within some upcoming work on identification (which includes the syntax topic). Keeping ticket open for visibility.