PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols
Many tests are based on the Unicode® database as well as the unicode tools from perl6. Not all methods and functions are in place e.g. uniprop() is not yet available in the jvm. Also perl6 seems to be based on Unicode version 8.0.0 but is scheduled for 9.0.0. However parts are working in version 9.0.0 now. Not available in jvm yet are uniprop, uniprop-bool, uniprop-int, uniprop-str.
use Unicode::PRECIS;
use Unicode::PRECIS::Identifier::UsernameCasePreserved;
my Unicode::PRECIS::Identifier::UsernameCaseMapped $uname-profile .= new;
my Str $username = "نجمة-الصباح";
my TestValue $tv = $uname-profile.enforce($username);
if $tv ~~ Str {
say "Username $username accepted but converted to $tv";
}
elsif $tv ~~ Bool {
say "Username not accepted";
}
I've started to study rfc4013 for SASLprep. Then recognized it was a profile based on Stringprep specified in rfc3454. Both are obsoleted by rfc7613 and rfc7564 resp because they are tied to Unicode version 3.2. The newer rfc's are specified to be free of any Unicode version.
- rfc3454 - Preparation of Internationalized Strings ("stringprep").
- rfc7564 - PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols Obsoletes rfc3454.
- rfc4013 - SASLprep: Stringprep Profile for User Names and Passwords
- rfc7613 - Preparation, Enforcement, and Comparison of Internationalized Strings Representing Usernames and Passwords Obsoletes rfc4013.
- rfc5892 - The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)
- rfc5893 - Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA) Several files are found at the Unicode® Character Database to generate the tables needed to find the proper character classes.
- UnicodeData.txt
- Whole zip file UCD.zip of version 9.0.0 including UnicodeData.txt
- Unicode Data File Format
- Unicode Normalization Forms #15
- Unicode Character Database #44
- East Asian Width #11
- Unicode Bidirectional Algorithm #9
Perl 6 uses graphemes as a base for the Str string type. These are the visible entities which show as a single symbol and are counted as such with the Str.chars
method. From this, normal forms can be generated using the string methods uniname, uninames, unival, univals, NFC, NFD, NFKC and NFKD. Furthermore the strings can be encoded to utf-8.
This project is tested with latest Rakudo built on MoarVM implementing Perl v6.c.
First the basis of the PRECIS framework will be build. As soon as possible a profile for usernames and passwords follows. This is my first need. When this functions well enough, other profiles can be inserted. Much of it is now Implemented.
Naming of modules;
- Unicode::PRECIS using rfc7564
- Unicode::PRECIS::Identifier using rfc7564
- Unicode::PRECIS::Identifier::UsernameCaseMapped using rfc7613
- Unicode::PRECIS::Identifier::UsernameCasePreserved using rfc7613
- Unicode::PRECIS::Freeform using rfc7564
- Unicode::PRECIS::Freeform::OpaqueString using rfc7613
Marcel Timmerman translation of the modules for perl 6
MARTIMM on github