fn/matches.re.xml: re00984 unicode-version
zadean opened this issue · 2 comments
Test re00984 tests a large number of code-points for the \w
character sequence.
Characters ⌈
and ⌉
are in this list. These codepoints were moved from \p{S}
to \p{P}
in unicode version 6.3, and therefore out of the \w
character sequence.
Perhaps the test should include the "unicode-version" dependency flag for version "6.2"?
@michaelhkay You make a very good point, and a separate test for the reclassified characters is definitely the better answer.
I took a quick look through the notes for the unicode updates since 6.3 and only found a few more category changes, but none that seem to break things in the current test suite as it stands.
Just a side note:
It may also be of interest to "modernize" a bit by adding some of the new emoji/emoticon codepoints to the \p{So}
tests (re00169 & re00207). I imagine they will are showing up in real data and adding them would add value to the test cases. Not that this suite is a unicode test-suite, but just a few to show some level of compliance for the newer characters. But that is something for a different issue.