moosetechnology/PetitParser

Grammar for URI IPv6address unexpectedly fails

Closed this issue · 2 comments

The following is based on the rule IPv6address given in appendix A in RFC 3986 (“Collected ABNF for URI”):

address := ((#digit asParser , $: asParser) max: 5) , #digit asParser , '::' asParser , #digit asParser.

This assertion passes as expected:

self assert: (address end parse: '1:2:3:4:5:6::7')
	= #(#(#($1 $:) #($2 $:) #($3 $:) #($4 $:) #($5 $:)) $6 '::' $7).

But this assertion does not because #parse: returns a PPFailure:

self assert: (address end parse: '1:2:3:4:5::6')
	= #(#(#($1 $:) #($2 $:) #($3 $:) #($4 $:)) $5 '::' $6).

Perhaps my expectation is wrong, but shouldn’t that parse successfully as well?

because the pattern after the max is parsable by the max:
let's make the parser before '::' different

((#digit asParser , $: asParser) max: 5) , #letter asParser , '::' asParser , #digit asParser

This parses successfully 1:2:3:4:5:a::7, 1:2:3:4:a::6, 1:2:3:a::5, etc.

what you can do is ensure the pattern after the max: is different

((#digit asParser , $: asParser) max: 6) , ':' asParser , #digit asParser

which works for your 2 examples

if you want to ensure at least one digit before '::', put it at the beginning

#digit asParser , $: asParser , ((#digit asParser , $: asParser) max: 5) , ':' asParser , #digit asParser

OK, I see, I should first rewrite the original, the eighth alternative from the rule for IPv6address in appendix A in RFC 3986:

  [ *5( h16 ":" ) h16 ] "::" h16
↳ ( *5( h16 ":" ) h16 "::" h16 ) / ( "::" h16 )
↳ ( *5( h16 ":" ) ( h16 ":" ) ":" h16 ) / ( "::" h16 )
↳ ( 1*6( h16 ":" ) ":" h16 ) / ( "::" h16 )

So that with PetitParser it becomes:

ipv6AddressAlternative8 := (((h16 , $: asParser) min: 1 max: 6) , $: asParser , h16) / ('::' asParser , h16).

Then both of the examples match (assuming, for simplicity, h16 := #digit asParser):

ipv6AddressAlternative8 end matches: '1:2:3:4:5:6::7'. "=> true"
ipv6AddressAlternative8 end matches: '1:2:3:4:5::6'. "=> true"

I will close this issue. Though this may need to be noted somewhere as something to take into account when translating from ABNF. Of course, even better would be to have a PPABNFParser which automates that, something like:

((PPABNFParser parse: 'example-rule = [ *5( DIGIT ":" ) DIGIT ] "::" DIGIT')
    compileClass: #PPExampleParser) new end matches: '1:2:3:4:5::6' "=> true"