Opt for `charCodeAt` instead of `codePointAt`
Closed this issue · 2 comments
codePointAt
is slower compared to charCodeAt
and str[i]
. See bevacqua/fuzzysearch#18 (comment) for ways to move to charCodeAt
without sacrificing extensibility that codePointAt
provides to us.
The very first thing we'll need to check if runes lying outside 0xffff have text casing, and if yes then whether we need to change its case are they better if they are left as is.
Other issue is that toLocalUpperCase
exists because in some languages, a lowercase character with a diacritic would be transformed to an uppercase character without a diacritic. Whether to choose a local version or a non-local version (in the library) is debatable.
Resources:
- https://richardjharris.github.io/all-sorts-of-things-you-can-get-wrong-in-unicode-and-why.html
- https://mathiasbynens.be/notes/javascript-unicode
- https://stackoverflow.com/questions/42181070/why-does-code-points-between-ud800-and-udbff-generate-one-length-string-in-ecm
- Benchmarked
charCodeAt
andcodePointAt
on https://jsbench.me/7hkrk6xyr5/1 by modifying https://gist.github.com/ajitid/96bed873d9d40b70d53ee4360194f565
So I ran few tests, and surprisingly I couldn't find any perf difference when I changed occurrences to charCodeAt
and fromCharCode
in this library. Closing this issue for now.