Consider Unicode Identifiers
wollmers opened this issue · 5 comments
Not that I myself would ever use it, but specifying the allowed characters for identifiers different from Perl would confuse users.
This is the definition for identifiers in https://perldoc.perl.org/perldata:
/ (?[ ( \p{Word} & \p{XID_Start} ) + [_] ])
(?[ ( \p{Word} & \p{XID_Continue} ) ]) * /x
IMHO it's well specified to extend a BNF by a notation conforming to Unicode.
Same for
METHODNAME ::= [a-zA-Z_]\w*
which has \w
as continuation characters and is the same as \p{Word}
under use utf8
and is not the same as [a-zA-Z0-9_]
.
I would love to see something like this, but I suspect the scope would be far outside of Corinna and would like complicate parsing. I also strongly suspect that P5P would reject something like this. For now, this is outside the scope of V1. Sorry.
While the spec says things like [a-zA-Z_]\w*
, I expect it implementation it would follow the standard rules for perl identifiers, which do allow unicode.
While the spec says things like
[a-zA-Z_]\w*
, I expect it implementation it would follow the standard rules for perl identifiers, which do allow unicode.
That's what I also expected, that P5P will not define an extra parser for Cor. Since 5.18 under use utf8
it's defined as follows (see https://perldoc.perl.org/perldata#Identifier-parsing):
/ (?[ ( \p{Word} & \p{XID_Start} ) + [_] ])
(?[ ( \p{Word} & \p{XID_Continue} ) ]) * /x
This uses Unicode properties made exactly for identifiers, where XID_Start
also contains letters outside ASCII or Latin, and '_' is added.
That's the definition of \p{Word}
which is the same as \w
in https://unicode.org/reports/tr18/#Default_Word_Boundaries:
\p{alpha}
\p{gc=Mark}
\p{digit}
\p{gc=Connector_Punctuation}
\p{Join_Control}
That's what I also expected, that P5P will not define an extra parser for Cor.
Indeed so. I was fully intending to just continue to use core bits-and-pieces for as much of this as possible, for consistency, rather than rebuild entire new things from scratch. I'm viewing the spec verymuch as a hand-wavy suggestion in this kind of sense.