[ucd/normal] Characters 가 through 힣 have the wrong Decomposition_Type
CAD97 opened this issue · 8 comments
See #27 which adds a failing test.
11172 test cases failed! (1100892 passed) {
0: Fail { line_num: Some(234), char: '가', exp_dt: Some(Canonical), actual_dt: None }
...
11171: Fail { line_num: Some(234), char: '힣', exp_dt: Some(Canonical), actual_dt: None }
}
Relevant line from DecompositionType.txt:
AC00..D7A3 ; Canonical # Lo [11172] HANGUL SYLLABLE GA..HANGUL SYLLABLE HIH
Great find! Yeah, Hangul characters have a special handling that's done algorithmically and not based on data files. We need to implement it.
@calum, this is good starter task. What you need to do is to use the const values in https://github.com/behnam/rust-unic/blob/master/unic/ucd/normal/src/hangul.rs to add a check for DecompositionType
implementation, to return the correct value for Hangul chars.
To test your work, you need to remove the #[should_panic(...)]
line from the conformance test (https://github.com/behnam/rust-unic/blob/master/unic/ucd/normal/tests/conformance_tests.rs#L34), expecting all the tests to pass.
What do you think?
Sounds good! I'll get to work. Thanks for the tips.
@calum, this is the main code-related issue blocking UNIC-0.5 release. Do you think you can submit a PR today or early tomorrow, or would you mind if I fix this myself and find you a new issue next week?
Btw, sorry to not mention that before. I just set up the milestone tags for all major releases, so we can better communicate these matters.
Hi, sorry I won't be able to get it finished this weekend. I've had family visiting this week so I haven't had time unfortunately.
I'll find another issue to work on next week.