Noto Sans Syriac stch implementation is wrong (but also correct)

Question

Noto Sans Syriac stch implementation is wrong (but also correct)

simoncozens opened this issue 3 years ago · 5 comments

In Syriac, the U+070F SYRIAC ABBREVIATION MARK is processed in a special way. The Arabic complex shaper applies a "magic" feature, stch, expecting a multiple substitution which returns an even number of glyphs:

feature stch {
  sub abbreviation-syriac by
    abbreviation-syriac.start
    abbreviation-syriac.line
    abbreviation-syriac.linedot
    abbreviation-syriac.line
    abbreviation-syriac.end;
} stch;

The shaper then (a) computes the full width of the text to be shaped, (b) puts the first returned glyph at the start of the sequence, (c) puts the final returned glyph at the end, (d) distributes the odd glyphs across the length of the text, (e) puts the even glyphs in the middle of those odd glyphs, repeating them until they cover the whole length of the text.

You can see this using Harfbuzz on Microsoft's Segoe UI Historic font:

$ shape seguihis.ttf  -u '70f 710 712 713 715'
[gid2404=4+723|gid2399=3+1589|gid2395=2+956|gid2335=1+1524|gid2486=0@-4800,0+0|gid2487=0@-4442,0+0|gid2487=0@-4141,0+0|gid2487=0@-3840,0+0|gid2487=0@-3539,0+0|gid2487=0@-3238,0+0|gid2487=0@-2937,0+0|gid2488=0@-2579,0+0|gid2487=0@-2221,0+0|gid2487=0@-1920,0+0|gid2487=0@-1619,0+0|gid2487=0@-1318,0+0|gid2487=0@-1017,0+0|gid2487=0@-716,0+0|gid2489=0@-358,0+0]

Unfortunately CoreText does not process the stch feature at all - not even a simple application, and particularly not the implementation with all the shaper magic:

$ shape seguihis.ttf  -u '70f 710 712 713 715' --shaper=coretext
[gid2404=4+723|gid2399=3+1589|gid2395=2+956|gid2335=1+1524|gid2334=0+41]

Now, to Noto. I guess the Noto Sans Syriac developers knew that stch was supposed to be a thing, and because they were developing on OS X and nothing was happening when they tried it, implemented it themselves, manually. It's an incredible feat of engineering, tens of thousands of lines of code, and what it does it really clever.

It's kind of a shame that it should all be replaced with a single line of code.

Well, the end result of this extremely clever manual implementation of something the shaper should have been doing all along, is that the stch feature is now completely broken on shapers which do do the right thing, like Uniscribe and Harfbuzz:

$ shape notosanssyriac/NotoSansSyriac-Regular.ttf -u '70f 710 712 713 715'
[SAM4xout=4@-2756,0+0|uni0715.Fina=4@-2756,0+525|SAM8in=4@-2231,0+0|SAM18out=3@-2231,0+0|uni0713.Medi=3@-2231,0+571|SAM8in=3@-1660,0+0|SAM12out=2@-1660,0+0|uni0712.Init=2@-1660,0+730|SAM8in=2@-930,0+0|SAM12out=1@-930,0+0|uni0710=1@-930,0+930|SAM8xin=1+0|uni070F.blank=0+0]

But at least it does work on CoreText:

$ shape notosanssyriac/NotoSansSyriac-Regular.ttf -u '70f 710 712 713 715' --shaper=coretext
[SAM4xout=4+0|uni0715.Fina=4+525|SAM8in=4+0|SAM4out=3+0|uni0713.Medi=3+571|SAM8in=3+0|SAM12out=2+0|uni0712.Init=2+729|SAM8in=2+0|SAM12out=1+0|uni0710=1+930|SAM8xin=1+0|uni070F.blank=0+0]

I've filed a Radar bug for CoreText, because CoreText is obviously not being compliant with the OTSpec here. So when CoreText gets fixed, it won't work there either.

Answer 1 · 2021-06-25T16:26:36.000Z

Now, to Noto. I guess the Noto Sans Syriac developers knew that stch was supposed to be a thing, and because they were developing on OS X and nothing was happening when they tried it, implemented it themselves, manually. It's an incredible feat of engineering, tens of thousands of lines of code, and what it does it really clever.

Le sighz.

Answer 2 · 2021-06-25T16:32:24.000Z

We should check if macOS still prefers the 0.3 cmap over 3.1, as it used to. If it does, it's possible to make a font that behaves differently in macOS vs. other implementations. (That's why we have the per-platform cmaps after all!)

Then we could duplicate the glyphs in question and assign the right Unicodes to different glyphs in the two different cmaps.

And then ofc the stch feature for the glyphs that come from the 3.1 cmap should be simple and that for the glyphs that cooker from the 0.3 cmap should be the incredible one 😀

Answer 3 · 2021-06-25T16:34:36.000Z

Then we could duplicate the glyphs in question and assign the right Unicodes to different glyphs in the two different cmaps.

But we shouldn't.

If this was properly raised with Ned, it probably was fixed by now...

Answer 4 · 2021-06-25T16:37:04.000Z

Trying to bend the font to support broken shapers is never the right thing to do, especially with the huge amount of complexity that it adds, and doubly especially when doing so messes up the font on correctly-implemented shapers.

Just do the right thing, and we can get the shapers fixed. :-)

Answer 5 · 2021-06-25T16:41:26.000Z

I mean, this may not be a sensible way of doing it, but that principle still is useful for some occasions, and I wouldn't fully discredit it. In fact, if we consider cmap the, principally, platform-specific entry point to the font, you can have fonts that implement the expected behaviors depending on the platform.

Font engine makers COULD subscribe to the principle that they prefer some implementation-specific cmap over the generic one if present. We could agree that PID>255 are to be "distributed" among various vendors and EID is their for the taking (they could use it as a version number of their engine).

I think it's unrealistic that all engines will always work the same way. :) Esp. with complex shaping.