Overproductive theonym lookups in Noto Nastaliq Urdu
Closed this issue · 3 comments
Font
NotoNastaliqUrdu-Regular.otf
Where the font came from, and when
Site: https://github.com/googlefonts/noto-fonts/blob/f7556efcaf940cf6c2f8ae0c3386d651e930b5cc/unhinted/otf/NotoNastaliqUrdu/NotoNastaliqUrdu-Regular.otf
Date: 2022-02-03
Font version
Version 3.002
Issue
Noto Nastaliq Urdu version 3.002 (not to be confused with yesterday’s build, also called 3.002) ligates some <lam, lam, heh> sequences which are clearly unrelated to the word “اللّٰه” and should not be ligated. If the heh is not really a heh but another letter that the font converts into a heh glyph, the ligature should not be formed. If the diacritics are incompatible with the pronunciation “llah”, the ligature should not be formed.
My rationale is that other <lam, lam, heh> sequences are not ligated; e.g. inserting a kasra, or ijam on a lam, blocks the ligature. I conclude that this ligature depends on the meaning of the word and not just the shapes of the glyphs. That’s why I include the second example below, though ae and heh are otherwise indistinguishable in final position.
Character data
للّٰة
U+0644 ARABIC LETTER LAM
U+0644 ARABIC LETTER LAM
U+0651 ARABIC SHADDA
U+0670 ARABIC LETTER SUPERSCRIPT ALEF
U+0629 ARABIC LETTER TEH MARBUTA
للّٰە
U+0644 ARABIC LETTER LAM
U+0644 ARABIC LETTER LAM
U+0651 ARABIC SHADDA
U+0670 ARABIC LETTER SUPERSCRIPT ALEF
U+06D5 ARABIC LETTER AE
لَله
U+0644 ARABIC LETTER LAM
U+064E ARABIC FATHA
U+0644 ARABIC LETTER LAM
U+0647 ARABIC LETTER HEH
Screenshots
Okay. It seems like the answer is to move this lookup earlier, before all the dot decomposition turns other characters into heh+marks. I can do that, but what I don’t have is a good set of rules for which marks should allow and which marks should block the lookup. Do you mind spelling that out for me?
According to https://github.com/googlefonts/noto-fonts/issues/384#issuecomment-110829445 and https://github.com/googlefonts/noto-fonts/issues/384#issuecomment-110836010, it should be something like:
lookup FormDivineName {
lookupflag IgnoreMarks;
sub LamIni LamMed HehFin by Divine_nm_p1;
} FormDivineName;
lookup DivNmCheck {
sub LamIni' lookup FormDivineName LamMed' ShaddaNS' [AlefSuperiorNS FathaNS]' HehFin';
} DivNmCheck;
The following rule might be okay in DivNmCheck
too, but then again I don’t know if it is necessary. Do people ever write exactly one of the two diacritics?
sub LamIni' lookup FormDivineName LamMed' [ShaddaNS AlefSuperiorNS FathaNS]' HehFin';
I've gone with your suggestion.