toptensoftware/RichTextKit

Missing glyphs when rendering clusters with unmatched combining characters

Opened this issue · 1 comments

"Missing glyphs" are rendered when rendering a grapheme cluster where the base character is contained in the font but the combining characters are not. This can be reproduced easily by rendering a text with obscure diacritics, e.g. A, using a font like "Agency FB".

I believe this issue occurs since the font fallback mechanisms only applies fallbacks at cluster boundaries (in FontFallback.GetFontRuns):

// Must be a cluster boundary
if (!GraphemeClusterAlgorithm.IsBoundary(codePoints, i))
    continue;

When rendering text like this with Firefox, Chrome, and GDI+, you get slightly different results for each but they all at least fallback to some other font for the missing combining characters. The CSS specification has some guidance w.r.t. this problem [1], however that solution may be a bit overkill for this library. A simpler solution may be to just use an available font for the missing characters (remove the boundary limitation, this seems to be what Firefox does), or use the font which matches the most consecutive characters in the cluster (seems to be what Chrome does).

Is there a reason why fallbacks are skipped when not at a cluster boundary? Simply removing this limitation results in rendering fairly similar to Firefox.

[1]: https://drafts.csswg.org/css-fonts/#cluster-matching

It looks like github removes obscure diacritics. Here's a screenshot of this issue when rendering A which I have put through a "Zalgo" text generator:

rtk-zalgo-bad_1673446192651_0

For reference here it is in Firefox:

firefox-zalgo_1673446428895_0

and by RichTextKit when removing the cluser boundary limitation:

rtk-zalgo-boundary_1673447119817_0