matplotlib/mplcairo

Using locl features available in a font

andjc opened this issue · 11 comments

andjc commented

Current documentation and examples show how to control opentype features available in a font, but there are no examples of how to control the opentype language system that is used.

Is specification of language system currently available in mplcairo, and if so, how to specify the language system to be used for text rendering?

I don't know anything about the locl system, does the syntax at https://github.com/matplotlib/mplcairo#font-formats-and-features not work? If you need more info, you will need to provide a font with multiple localized forms and a series of glyphs with which I can test that myself.

andjc commented

@anntzer

by itself passing the locl opentype feature would do little. I assume raqm requires sufficient information to identify required opentype script and opentype language system to use, falling back to DFLT.dflt in the absence of any other info.

Any of the pan-CJK fonts that support language specific variant Han ideographs would be good tests.

Below some specific suggestions, I've included the bcp47 language tags and a list of script/language systems specifically supported by each font.

Gentium Plus: https://software.sil.org/downloads/r/gentium/GentiumPlus-6.101.zip

String1: Ấấ Ầầ Ẩẩ Ẫẫ Ắắ Ằằ Ẳẳ Ẵẵ Ếế Ềề Ểể Ễễ Ốố Ồồ Ổổ Ỗỗ
Diacritic stacking will change for Vietnamese language system.
lang="vi"

String2: б г д п т ѓ
In Italic typeface, Serbian and Macedonian will use alternative glyphs.
lang="sr" or lang="mk"

> otfinfo -s Gentium_Plus_Regular.ttf 
DFLT		Default
cyrl		Cyrillic
cyrl.MKD	Cyrillic/Macedonian
cyrl.SRB	Cyrillic/Serbian
grek		Greek
latn		Latin
latn.IPPH	Latin/Phonetic transcription—IPA conventions
latn.VIT	Latin/Vietnamese
>
> otfinfo -s Gentium_Plus_Italic.ttf 
DFLT		Default
cyrl		Cyrillic
cyrl.MKD	Cyrillic/Macedonian
cyrl.SRB	Cyrillic/Serbian
grek		Greek
latn		Latin
latn.IPPH	Latin/Phonetic transcription—IPA conventions
latn.VIT	Latin/Vietnamese

Scheherazade New: https://software.sil.org/downloads/r/scheherazade/ScheherazadeNew-3.300.zip

String3: ه ههه
Alternative glyphs used for Kurdish
lang='ku'

String4: م ممم ۶ ۷ بِّ
Alternative glyphs used for Sindhi
lang='sd'

> otfinfo -s Scheherazade_New_Regular.ttf 
arab		Arabic
arab.KIR	Arabic/Kirghiz
arab.KUR	Arabic/Kurdish
arab.RHG	Arabic/<unknown language>
arab.SND	Arabic/Sindhi
arab.URD	Arabic/Urdu
arab.WLF	Arabic/Wolof
latn		Latin

Padauk: https://software.sil.org/downloads/r/padauk/Padauk-5.000.zip

String5: က︀ ၵ︀ ꩡ︀ ယ︀ လ︀ ၸ︀ ၺ ꩺ
Alternative glyphs used for Tai Aiton and Tai Phake
lang="aio" or lang="phk"

String6: ကှ​ ကှု ကှူ ကွ ကျွ ကြွ ကွှ
Alternative glyphs used for Kayah
lang="kyu"

String7: တွ တျွ တြွ တွှ
Alternative glypgs used for Shan
lang="shn"

N.B Padauk supports both mymr and mym2 opentype script tags

> otfinfo -s Padauk_Regular.ttf 
DFLT		Default
DFLT.CSH	Default/<unknown language>
DFLT.KHN	Default/<unknown language>
DFLT.KHT	Default/<unknown language>
DFLT.KSW	Default/<unknown language>
DFLT.KYU	Default/<unknown language>
DFLT.SHN	Default/Shan
mym2		<unknown script>
mym2.CSH	<unknown script>/<unknown language>
mym2.KHN	<unknown script>/<unknown language>
mym2.KHT	<unknown script>/<unknown language>
mym2.KSW	<unknown script>/<unknown language>
mym2.KYU	<unknown script>/<unknown language>
mym2.SHN	<unknown script>/Shan
mymr		Myanmar
mymr.CSH	Myanmar/<unknown language>
mymr.KHN	Myanmar/<unknown language>
mymr.KHT	Myanmar/<unknown language>
mymr.KSW	Myanmar/<unknown language>
mymr.KYU	Myanmar/<unknown language>
mymr.SHN	Myanmar/Shan
mymr.dlft	Myanmar/<unknown language>

One of the benefits of supporting locl is that not all variations supported by a language system are exposed as features in a font. And it simplifies python devs often needing to know their way around the guts of opentype features of each font.

Thank you for the reference. This feature is not available right now in mplcairo, but I would probably accept a PR adding support for it using an extension of the opentype feature syntax, i.e. font=Path("/path/to/font.ttf|frac,onum,locl=vi,...") (AFAICS this syntax has the advantage of also allowing one to set the feature on a certain character subrange, without introducing a new API -- does this seem reasonable to you? Do you foresee problems with this approach?)

I’d not overload the locl feature tag, something like language=XXXX would be better (the language can affect any feature not just locl).

Sure, that seems reasonable too.

I think the following patch is sufficient (it does work locally for me on the Gentium cyrillic example), can you confirm?

diff --git i/src/_raqm.h w/src/_raqm.h
index 471d3af..adf00cf 100644
--- i/src/_raqm.h
+++ w/src/_raqm.h
@@ -13,6 +13,7 @@ extern "C" {  // Support raqm<=0.2.
   _(get_glyphs) \
   _(layout) \
   _(set_freetype_face) \
+  _(set_language) \
   _(set_text_utf8) \
   _(version_string) \
   _(version_atleast)
diff --git i/src/_util.cpp w/src/_util.cpp
index e5cbecd..1d43a25 100644
--- i/src/_util.cpp
+++ w/src/_util.cpp
@@ -797,7 +797,13 @@ GlyphsAndClusters text_to_glyphs_and_clusters(cairo_t* cr, std::string s)
          *static_cast<std::vector<std::string>*>(
            cairo_font_face_get_user_data(
              cairo_get_font_face(cr), &detail::FEATURES_KEY))) {
-      TRUE_CHECK(raqm::add_font_feature, rq, feature.c_str(), -1);
+      auto lang_tag = "language="s;
+      if (feature.substr(0, lang_tag.size()) == lang_tag) {
+        TRUE_CHECK(raqm::set_language,
+                   rq, feature.c_str() + lang_tag.size(), 0, s.size());
+      } else {
+        TRUE_CHECK(raqm::add_font_feature, rq, feature.c_str(), -1);
+      }
     }
     TRUE_CHECK(raqm::layout, rq);
     auto num_glyphs = size_t{};

Supporting setting different languages over a single string (perhaps reusing a indexing syntax like https://harfbuzz.github.io/harfbuzz-hb-common.html#hb-feature-from-string) would be left as an exercise to the reader...

(@khaledhosny Would it be safe for the tag to be named "lang" instead of "language"? (i.e. will there ever be a font feature which is actually called "lang"?) This would perhaps allow "abusing" hb_feature_from_string to support indexing syntax here.)

Would it be safe for the tag to be named "lang" instead of "language"

Feature tags can be any four bytes, so nothing prevents a font from having a lang feature, and one can never know what features would be registered in the future.

OK, I'll stick to the patch above for now (if either you or @andjc can confirm that it works) and defer slicing syntax to another time, then.

The above patch is now in master. Leaving open as we may consider implementing slicing later.

Also pushed support for slicing. Thus closing, but feel free to ping for reopen in case I missed anything.