SMLFamily/Successor-ML

Unicode in SML text

Opened this issue · 2 comments

As an extension of #29 , allow all characters with Unicode general category letter or number to be used for alphanumeric identifiers (*), all characters with Unicode general category symbol to be used for symbol identifier and all characters with Unicode general category number to be used for numbers (**). Take care of additional end of line characters in Unicode.

(*) With, as it is already, the addition of the underscore and numbers not allowed as the first character.

(**) Optionally and not Unicode related, would be nice to allow underscores in numbers like Ada do, as it helps readability: ex. 1_234_567

Note that (**) is already part f the SuccessorML specification (and is implemented by both SML/NJ and MLton). For example:

% sml -Cparser.succ-ml=true
Standard ML of New Jersey v110.81 [built: Tue May  2 11:51:11 2017]
- 123_456;
val it = 123456 : int
- 

Thanks, I knew I've seen it with an SML compiler, but I was not suspecting it was already part of the standard.

About my original message, I was thinking using all characters with general category number may not be a good idea: I was thinking the character “ ² ” belongs to this category and writing 3² as 32 would be ambiguous. So I checked, and “ ² ” belongs to Other Number. So the Number category, but not the Other Number category (worth to be stressed).