n8willis/opentype-shaping-documents

[Indic] Recomposing Bengali "Ya, Nukta" to "Yya"

adrianwong opened this issue · 10 comments

HarfBuzz specially recomposes Bengali "Ya, Nukta" to "Yya". I've traced the change all the way back to this commit.

Any ideas as to why this might have been done?

Guess only Behdad can tell the original intention. My idea though:

When a font does not do sub ya nukta by yya; itself but the intention is to only allow a nukta-less ya to form yasign, a subsequent sub virama ya by yasign; can have unexpected result in the context of <base, virama, ya, nukta> because the possibility of a following nukta is overlooked. I’ve seen such fonts, and user reports might lead to such a patch in HB, although I don’t think shaping engines should get into this business.

I think this was done to prevent Ya,Nukta to form yaphala.

Thanks @lianghai, @behdad. Wanting to prevent "Yaphala" makes perfect sense, although I do agree that that responsibility should ideally be the font's.

It sounds like the rough consensus here is that this is not a spec issue. Is that right?

Alternatively, do we want to document it less formally, such as in an inline Note ?

Bit of a tough one. It would be nice to document All The Things, but in this case I'm inclined to agree that it's not a spec issue.

It's a workaround for fonts that aren't behaving as they should, and I don't think it's in the scope of the spec to address that. From what little I know, it feels like down that path lies madness.

For now, I think a happy medium would be for me to add a descriptive comment in the code, and link back to this issue.

If not to document the pecularities and corner cases, then what is the point of this project?

I feel that the goal is to have a specification that allows the creation of interoperable implementations without requiring additional reverse engineering of existing implementations, so if there are peculiarities that must be understood in order to do this it would be good to have them in an implementation note of some kind.

If not to document the pecularities and corner cases, then what is the point of this project?

I was hesitant to capture it, as this issue has less to do with an inadequacy with existing OpenType literature, and more to do with being an implementation workaround for a particular set of fonts.

As to how many fonts currently rely on this workaround such that it is now effectively a specification, I have no idea.

After mulling it over, perhaps what Nathan and Michael have suggested is a better solution to what I'd proposed.

Also, Behdad, if you have any good suggestions to make, I'm all ears.

To me, the question here is whether we ought to specify how people handle the potential problem, or just note that it's a problem and give them advice.

Kinda like how we say you have to 'tag consonants that have a below base form' in the text, but just offer the advice that 'you could do that by maintaining a static table of character properties for every codepoint, but it's more reliable to look in the font's GSUB'.

Or (possibly more related), we put the locl feature application where the MS OpenType docs say it should be done, but we also note that you can do it earlier. And we say you have to decompose your compound matras by the time you're done with step 2.2, but we also note that you could do that earlier, too.

(Note that I'm assuming whatever we say about this would go in both step 2.4 and 3.2)

I believe this is fixed by bd1c074 -- although of course, it's a wording patch. So I'll leave this open for a few more days in case anyone has further comments or think the change needs work.