denull/utf-c

Typo in rangesLatin declaration

mdmt1 opened this issue · 2 comments

mdmt1 commented

In both Go and JS versions, range for comma character is specified incorrectly: {0x2D, 0x2C} instead of {0x2C, 0x2D}.

mdmt1 commented

Also, in auxOffset declaration there is an outdated comment:
// 0x0000, Latin is a special case, it merges A-Z, a-z, 0-9, "-" and " " characters.
Note "-" ({0x2D, 0x2E}) instead of ",".
Same error in README.md, assuming Habr post as a source of truth.

Actually, I think that was a mistake in the article, it's supposed to be "-" (i.e. {0x2D, 0x2E}). Obviously, one can choose "," instead in their own implementation (and tweak the code accordingly), depending on the context. For example, dash can be more useful when storing words in a dictionary (or for texts with a lot of negative numbers? :)