Enhancement: Optimize generated classes with character data
OptimumCode opened this issue · 1 comments
Is there an existing issue for this?
- I have searched the existing issues
Enhancement description
Currently generated classes with character data check if the codepoint belongs to this class by using some kind of binary search between the ranges this class has. This causes O(log(n)) complexity to find if the class contains the codepoint in the worst case.
I think this can be optimized in the following way:
We can divide ranges into sub-groups by applying a shift-right bit operation (codepoint >> 16). This will group ranges based on their higher bits. As a result, we will have smaller groups of ranges to apply our binary search and we will be able to filter out codepoints that cannot belong to the class by comparing the result of a shift-right operation with known values for this class.
fun contains(codepoint: Int): Boolean {
if (codepoint < minCodepoint || codepoint > maxCodepoint) {
return false
}
return when (codepoint shr 8) {
in 0x0..0x2 -> // binary search in this group
0x10 -> // binary search in this group
else -> false
}
}
Any attempt to group ranges and keep the correctness did not give any performance benefits. Benchmarks show that the performance only gets worse (2-3 times worse).
I will close this issue for now. If new ideas on how to optimize the generated code come to my mind (or somebody else's) I will reopen this issue.