Add errata document

Question

Add errata document

Opened this issue 5 years ago · 22 comments

We ought to document all the shaping-related errata we (reasonably) can as well as describing "how it should work". Let's gather those here for the moment, with an eye towards calling the initial set complete in a couple of weeks.

So far:

The GSUB spec says that MultipleSubst cannot be used to delete a glyph, it always substitutes at least one replacement glyph, but some implementations allow the replacement glyph array to be zero-length.
We have several Uniscribe-specific compatibility bugs listed in
opentype-shaping-documents/notes/uniscribe-bug-compatibility.md
The spec is ambiguous about which adjacent-mark sequences need reordering, as per #34 (comment)

Answer 1 · 2019-03-25T21:59:00.000Z

The scriptListOffset, featureListOffset, and lookupListOffset fields in the GSUB/GPOS header may be NULL, despite the spec only suggesting that featureVariationsOffset may be NULL.

Answer 2 · 2019-03-25T22:03:33.000Z

I personally prefer the direction that any offset may be NULL...

Answer 3 · 2019-03-25T22:06:30.000Z

A NULL offset is at least a clear indication of a missing value, although it makes more sense for values which are explicitly optional.

The weird situation we've encountered with some fonts is very small but non-zero offsets, like "4", which aren't big enough to point outside of the current struct and clearly don't point to a valid value if you try to follow them. I'm not sure how font creation tools manage to make such mistakes.

Answer 4 · 2019-03-25T22:10:43.000Z

I can imagine an offset value of 4 pointing to two 0 bytes to be a valid encoding of an empty array...

And offset to empty array can be encoded as NULL. No need to have wording to allow it.

Answer 5 · 2019-03-25T22:14:54.000Z

Yes if it points to a valid value that's fine, although a little weird, but the reason we noticed in the first place is because it wasn't valid 😆

Something I wonder is whether an offset to a zero-sized object is allowed to point outside the file 🤔

Answer 6 · 2019-03-25T22:18:47.000Z

Something I wonder is whether an offset to a zero-sized object is allowed to point outside the file 🤔

We don't allow that. I mean. We go ahead and "fix" it by rewriting the offset with NULL, so doesn't make a difference.

Answer 7 · 2019-03-25T22:19:10.000Z

Something I wonder is whether an offset to a zero-sized object is allowed to point outside the file 🤔

We don't allow that. I mean. We go ahead and "fix" it by rewriting the offset with NULL, so doesn't make a difference.

And by we, I meant in HarfBuzz.

Answer 8 · 2019-03-25T22:23:00.000Z

Another thing that would be good to clarify is the way nested contextual lookups use their own lookup flag, but other lookups within a contextual lookup use the parent's lookup flag. I don't recall if the spec has anything to say about nested contextual lookups; it's also helpful to know whether the child's context can extend beyond the parent's, let alone the weirdness of using child MultilpleSubst to delete the parent's context!

Answer 9 · 2019-03-25T22:24:58.000Z

Also GSUB lookups must be sorted by lookup index before being applied, but as I recall GPOS lookups must not?

Answer 10 · 2019-03-25T22:26:53.000Z

Another thing that would be good to clarify is the way nested contextual lookups use their own lookup flag, but other lookups within a contextual lookup use the parent's lookup flag.

I think in harfbuzz we always use the child's flags. Do you have a test case that can reveal this?

I don't recall if the spec has anything to say about nested contextual lookups; it's also helpful to know whether the child's context can extend beyond the parent's,

@litherum reports that Windows does not allow that, while HarfBuzz and CoreText do.
https://twitter.com/Litherum/status/1103911322872307715

let alone the weirdness of using child MultilpleSubst to delete the parent's context!

Deleting parent's context is no different from ligating parent's context, which is one of the examples in AFDKO feature file format (matching "ffi" then a child ligating f+f, then other child ligating ff+i).

Answer 11 · 2019-03-25T22:28:06.000Z

Also GSUB lookups must be sorted by lookup index before being applied, but as I recall GPOS lookups must not?

GPOS is mostly additive. I don't know what Windows does. But HarfBuzz sorts them. The spec clearly says lookups are applied in their numeric order. Of course there's the per-script lists...

Answer 12 · 2019-03-25T22:59:45.000Z

I think in harfbuzz we always use the child's flags. Do you have a test case that can reveal this?

A quick test reveals that the Amiri fonts break if we use the child's lookup flag.

Answer 13 · 2019-03-25T23:16:20.000Z

GPOS is mostly additive. I don't know what Windows does. But HarfBuzz sorts them. The spec clearly says lookups are applied in their numeric order. Of course there's the per-script lists...

Hmm I just tried sorting vs. not-sorting them and got the same results each time; I got different results with our old implementation, but that must have been due to a bug. 🤓

Answer 14 · 2019-03-26T00:07:14.000Z

I think in harfbuzz we always use the child's flags. Do you have a test case that can reveal this?

A quick test reveals that the Amiri fonts break if we use the child's lookup flag.

(I really hope I've tested this correctly...)

Test font: Amiri v. 000.109
Test sequence: U+0646 (Letter Noon), U+0652 (Sukun), U+0628 (Letter Beh)

Lookup i: 139 (chaining contextual, lookup flag: 8 (ignore marks)) specifies a child lookup i: 109 (single, lookup flag: 0). Using the child's lookup flag appears to inhibit the substitution of glyph 2341 -> 3219, resulting in an output that looks like this:

as opposed to this, which uses the parent's lookup flag (this is how it looks with HarfBuzz/CoreText):

Answer 15 · 2019-03-26T21:47:55.000Z

Lookup i: 139 (chaining contextual, lookup flag: 8 (ignore marks)) specifies a child lookup i: 109 (single, lookup flag: 0). Using the child's lookup flag appears to inhibit the substitution of glyph 2341 -> 3219, resulting in an output that looks like this:

as opposed to this, which uses the parent's lookup flag (this is how it looks with HarfBuzz/CoreText):

That doesn't make sense. Why would a IgnoreMarks lookupflag inhibit a single substitution? I checked HarfBuzz code again, we definitely use the child lookup flag.

Answer 16 · 2019-03-26T21:48:06.000Z

cc @khaledhosny FYI

Answer 17 · 2019-03-26T22:07:37.000Z

I don’t get this either. The lookups in question are basically:

lookup BaaNonIsol {                                                             
  sub @aBaa.init by @aBaa.init_BaaNonIsol;                                      
  sub @aNon.fina by @aNon.fina_BaaNonIsol;                                      
} BaaNonIsol;

lookup BaaNonIsolCalt {                                                                  
  lookupflag IgnoreMarks;
  sub [@aBaa.init]' lookup BaaNonIsol                                           
      [@aNon.fina]' lookup BaaNonIsol;
} BaaNonIsolCalt;

The contextual substitution lookup has IgnoreMarks flag as it should, so that “U+0646 U+0628” sequence would match regardless of any intervening marks. The single substitution lookup does not have IgnoreMarks flag as it woldn’t make any difference as it applies to single input glyph, no marks would be in the input to ignore or not.

BTW, your input would give the output you show only if the text was processed LTR, not sure if this was intentional, but I’d make sure Arabic text is tested in RTL direction as LTR don’t always give the expected output (and might give different results in different implementations).

Answer 18 · 2019-03-26T23:38:30.000Z

BTW, your input would give the output you show only if the text was processed LTR, not sure if this was intentional, but I’d make sure Arabic text is tested in RTL direction as LTR don’t always give the expected output (and might give different results in different implementations).

Sorry! This was unintentional on my part. We do test Arabic text in RTL, but when I was writing up my findings I somehow got it in my head that I needed to specify the input in reverse 🤦‍♂️.

That doesn't make sense. Why would a IgnoreMarks lookupflag inhibit a single substitution? I checked HarfBuzz code again, we definitely use the child lookup flag.

The contextual substitution lookup has IgnoreMarks flag as it should, so that “U+0646 U+0628” sequence would match regardless of any intervening marks. The single substitution lookup does not have IgnoreMarks flag as it woldn’t make any difference as it applies to single input glyph, no marks would be in the input to ignore or not.

Thank you for your responses! Makes sense.

Answer 19 · 2019-03-27T01:53:15.000Z

Another thing that would be good to clarify is the way nested contextual lookups use their own lookup flag, but other lookups within a contextual lookup use the parent's lookup flag.

This was some confusion on my part, I was mixing up the use of the parent's lookup flag and the child's in a way that happened to make the tests pass so I never realised. 😆

We've simplified the code now and it makes much more sense, thanks for your patience.

Answer 20 · 2019-10-21T08:48:54.000Z

GPOS is mostly additive. I don't know what Windows does. But HarfBuzz sorts them. The spec clearly says lookups are applied in their numeric order. Of course there's the per-script lists...

So, is there an ambiguity regarding the per-script lists?

Answer 21 · 2020-04-01T12:11:22.000Z

GPOS is mostly additive. I don't know what Windows does. But HarfBuzz sorts them. The spec clearly says lookups are applied in their numeric order. Of course there's the per-script lists...

So, is there an ambiguity regarding the per-script lists?

Following up on this, my guess would be that this means it's ambiguous how to sort the GPOS lookups that are script-tagged with the GPOS lookups that are generic/default(dflt?). Is that the concern?

If there's something here, I'll add it to errata.

Answer 22 · 2020-04-02T11:42:07.000Z

Noting the nested-contextual-lookups issue detailed in allsorts #25.