open-i18n/rust-unic

[ucd/normal] Characters 가 through 힣 have the wrong Decomposition_Type

CAD97 opened this issue · 8 comments

CAD97 commented

See #27 which adds a failing test.

11172 test cases failed! (1100892 passed) {

0: Fail { line_num: Some(234), char: '가', exp_dt: Some(Canonical), actual_dt: None }

...

11171: Fail { line_num: Some(234), char: '힣', exp_dt: Some(Canonical), actual_dt: None }

}

Relevant line from DecompositionType.txt:

AC00..D7A3    ; Canonical # Lo [11172] HANGUL SYLLABLE GA..HANGUL SYLLABLE HIH

Great find! Yeah, Hangul characters have a special handling that's done algorithmically and not based on data files. We need to implement it.

@calum, this is good starter task. What you need to do is to use the const values in https://github.com/behnam/rust-unic/blob/master/unic/ucd/normal/src/hangul.rs to add a check for DecompositionType implementation, to return the correct value for Hangul chars.

To test your work, you need to remove the #[should_panic(...)] line from the conformance test (https://github.com/behnam/rust-unic/blob/master/unic/ucd/normal/tests/conformance_tests.rs#L34), expecting all the tests to pass.

What do you think?

calum commented

Sounds good! I'll get to work. Thanks for the tips.

@calum, this is the main code-related issue blocking UNIC-0.5 release. Do you think you can submit a PR today or early tomorrow, or would you mind if I fix this myself and find you a new issue next week?

Btw, sorry to not mention that before. I just set up the milestone tags for all major releases, so we can better communicate these matters.

calum commented

Hi, sorry I won't be able to get it finished this weekend. I've had family visiting this week so I haven't had time unfortunately.

I'll find another issue to work on next week.

No worries, @calum. I'll fix this one and we find you new issues next week. Thanks anyway! :)

Fixed by #72