fn/matches.re.xml: re00984 unicode-version

Question

fn/matches.re.xml: re00984 unicode-version

zadean opened this issue 5 years ago · 2 comments

Test re00984 tests a large number of code-points for the \w character sequence.
Characters ⌈ and ⌉ are in this list. These codepoints were moved from \p{S} to \p{P} in unicode version 6.3, and therefore out of the \w character sequence.

Perhaps the test should include the "unicode-version" dependency flag for version "6.2"?

Answer 1 · 2019-08-19T17:39:31.000Z

It would be a shame to put that dependency on the whole test - better to move the relevant part into a separate test with a dependency. Michael Kay

…

On 19 Aug 2019, at 18:20, Zachary Dean ***@***.***> wrote: Test re00984 tests a large number of code-points for the \w character sequence. Characters ⌈ and ⌉ are in this list. These codepoints were moved from \p{S} to \p{P} in unicode version 6.3, and therefore out of the \w character sequence. Perhaps the test should include the "unicode-version" dependency flag for version "6.2"? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6?email_source=notifications&email_token=AASIQIU2NDBFSFJP7XPA2NLQFLI6LA5CNFSM4IND6ULKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HGBKPQA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASIQIUHFAAG2QFROF6EZ7DQFLI6LANCNFSM4IND6ULA>.

Answer 2 · 2019-08-20T18:53:35.000Z

@michaelhkay You make a very good point, and a separate test for the reclassified characters is definitely the better answer.

I took a quick look through the notes for the unicode updates since 6.3 and only found a few more category changes, but none that seem to break things in the current test suite as it stands.

Just a side note:
It may also be of interest to "modernize" a bit by adding some of the new emoji/emoticon codepoints to the \p{So} tests (re00169 & re00207). I imagine they will are showing up in real data and adding them would add value to the test cases. Not that this suite is a unicode test-suite, but just a few to show some level of compliance for the newer characters. But that is something for a different issue.