Opt for `charCodeAt` instead of `codePointAt`

Question

Opt for `charCodeAt` instead of `codePointAt`

Closed this issue 3 years ago · 2 comments

codePointAt is slower compared to charCodeAt and str[i]. See bevacqua/fuzzysearch#18 (comment) for ways to move to charCodeAt without sacrificing extensibility that codePointAt provides to us.

Answer 1 · 2021-08-04T08:04:26.000Z

The very first thing we'll need to check if runes lying outside 0xffff have text casing, and if yes then whether we need to change its case are they better if they are left as is.

Other issue is that toLocalUpperCase exists because in some languages, a lowercase character with a diacritic would be transformed to an uppercase character without a diacritic. Whether to choose a local version or a non-local version (in the library) is debatable.

Resources:

https://richardjharris.github.io/all-sorts-of-things-you-can-get-wrong-in-unicode-and-why.html
https://mathiasbynens.be/notes/javascript-unicode
https://stackoverflow.com/questions/42181070/why-does-code-points-between-ud800-and-udbff-generate-one-length-string-in-ecm
Benchmarked charCodeAt and codePointAt on https://jsbench.me/7hkrk6xyr5/1 by modifying https://gist.github.com/ajitid/96bed873d9d40b70d53ee4360194f565

Answer 2 · 2021-08-04T14:43:10.000Z

So I ran few tests, and surprisingly I couldn't find any perf difference when I changed occurrences to charCodeAt and fromCharCode in this library. Closing this issue for now.