notofonts/nastaliq

Overproductive theonym lookups in Noto Nastaliq Urdu

Closed this issue · 3 comments

Font

NotoNastaliqUrdu-Regular.otf

Where the font came from, and when

Site: https://github.com/googlefonts/noto-fonts/blob/f7556efcaf940cf6c2f8ae0c3386d651e930b5cc/unhinted/otf/NotoNastaliqUrdu/NotoNastaliqUrdu-Regular.otf
Date: 2022-02-03

Font version

Version 3.002

Issue

Noto Nastaliq Urdu version 3.002 (not to be confused with yesterday’s build, also called 3.002) ligates some <lam, lam, heh> sequences which are clearly unrelated to the word “اللّٰه” and should not be ligated. If the heh is not really a heh but another letter that the font converts into a heh glyph, the ligature should not be formed. If the diacritics are incompatible with the pronunciation “llah”, the ligature should not be formed.

My rationale is that other <lam, lam, heh> sequences are not ligated; e.g. inserting a kasra, or ijam on a lam, blocks the ligature. I conclude that this ligature depends on the meaning of the word and not just the shapes of the glyphs. That’s why I include the second example below, though ae and heh are otherwise indistinguishable in final position.

Character data

للّٰة
U+0644 ARABIC LETTER LAM
U+0644 ARABIC LETTER LAM
U+0651 ARABIC SHADDA
U+0670 ARABIC LETTER SUPERSCRIPT ALEF
U+0629 ARABIC LETTER TEH MARBUTA

للّٰە
U+0644 ARABIC LETTER LAM
U+0644 ARABIC LETTER LAM
U+0651 ARABIC SHADDA
U+0670 ARABIC LETTER SUPERSCRIPT ALEF
U+06D5 ARABIC LETTER AE

لَله
U+0644 ARABIC LETTER LAM
U+064E ARABIC FATHA
U+0644 ARABIC LETTER LAM
U+0647 ARABIC LETTER HEH

Screenshots

للّٰة
للّٰە
لَله

Okay. It seems like the answer is to move this lookup earlier, before all the dot decomposition turns other characters into heh+marks. I can do that, but what I don’t have is a good set of rules for which marks should allow and which marks should block the lookup. Do you mind spelling that out for me?

According to https://github.com/googlefonts/noto-fonts/issues/384#issuecomment-110829445 and https://github.com/googlefonts/noto-fonts/issues/384#issuecomment-110836010, it should be something like:

lookup FormDivineName {
    lookupflag IgnoreMarks;
    sub LamIni LamMed HehFin by Divine_nm_p1;
} FormDivineName;
lookup DivNmCheck {
    sub LamIni' lookup FormDivineName LamMed' ShaddaNS' [AlefSuperiorNS FathaNS]' HehFin';
} DivNmCheck;

The following rule might be okay in DivNmCheck too, but then again I don’t know if it is necessary. Do people ever write exactly one of the two diacritics?

    sub LamIni' lookup FormDivineName LamMed' [ShaddaNS AlefSuperiorNS FathaNS]' HehFin';

I've gone with your suggestion.