Support parsing at byte, codepoint and grapheme cluster level.
honungsburk opened this issue · 0 comments
honungsburk commented
Javascript uses UTF-16 which means that a string has three "different" units of lengths that are useful in different situations.
- bytes: The number of actual bytes the string uses.
- codepoints: UTF-16 is variable length encoded and 1 codepoint is either 2 or 4 bytes.
- grapheme clusters: 1 or more codepoints, this is what users of the library think of as "characters"
Right now, I believe functions such as chompIf
look at codepoints. This must be made more clear and we should add new combinators to parse a string at each level of fidelity: bytes, codepoints, or grapheme clusters.