saidwho12/hamza

latin combining characters config question

Closed this issue · 8 comments

I can't get the stbtt_rasterize demo to properly display u with a combining character umlaut ("u\u0308" in c). I have little experience with unicode text shaping, so i assume this is a configuration issue or something else I don't know about. I've tried a few different feature combinations unsuccessfully.

what features do i need to enable for this to work? I have ABVM and CCMP since those seem obvious.

After some research I found out that this is simply an issue of Unicode Normalization. It's supposed to work without even specifying any typographical features. I'm gonna be working on a fix right now.

very cool. can you explain a bit more? what about normalization is needed for this to work? after reading about it, i'm not quite sure.

Yeah so Unicode Normalization takes care of combining characters and also takes care of just cleaning up a Unicode string you can read more on it here: https://www.unicode.org/reports/tr15/

So say I have a string of Arabic with some characters in MEDIAL form and others in FINAL and others as ISOLATED or just Normalized form, the function would turn everything into normalized form.

In normalization if you have a latin alphabet character followed by a combining mark, it will combine them into a new unicode character. It also has to validate that the combining marks are valid. OpenType tables on the other hand work on glyphs and doesn't know anything about unicode characters, it's used to implement more complex font features like ligatures, kerning, cursive scripts, stylistic alternates, etc.

It seems so, but not all valid graphemes that use combining characters can be composed into a single codepoint. Is there something about normalization enforcing ordering that is the missing piece?

if you have discord and you're ok with adding me, i can also message you there since github is inconvenient. superdivider#2041

im also confused about the api. why in the stbtt_example do you iterate through the glyph metrics and keep track of pen_x manually? how is this supposed to work with decomposed glyphs? i would expect to get extents and a list of positions / glyph identifiers / dimensions, since not all glyphs are just put at the end of the ltr layout.

This is how every library does it and what do you mean with "this doesn't work with decomposed glyphs" ? If you mean mark placement then this works because marks don't advance, and x_offset and y_offset take care of positioning .