contour-terminal/libunicode

Optimize performance for grapheme cluster break lookup (and other tables)

christianparpart opened this issue · 1 comments

Checklist

  • implement table lookup based on https://www.strchr.com/multi-stage_tables for all tables
  • Evaluate the possibility to join commonly looked up attributes into a single table (grapheme break, script, width, emoji default presentation, ...)

Future invesgitation

There's a very good research done by utf8proc team: https://halt.software/optimizing-unicodes-grapheme-cluster-break-algorithm/

We could see if we can implement it like that, too, document it, and reference their great work.

  • implement break algorithm based on the above web link
  • document the idea behind that algorithm such that one can understand it without looking at further (possible future-deleted) web articles
  • perf-test against naive implementation (probably simply by doing it as part of contour-terminal/contour#692 - which desperately needs an improved performance for the break algorithm.

Done in #46.