Using locl features available in a font
andjc opened this issue · 11 comments
Current documentation and examples show how to control opentype features available in a font, but there are no examples of how to control the opentype language system that is used.
Is specification of language system currently available in mplcairo, and if so, how to specify the language system to be used for text rendering?
I don't know anything about the locl system, does the syntax at https://github.com/matplotlib/mplcairo#font-formats-and-features not work? If you need more info, you will need to provide a font with multiple localized forms and a series of glyphs with which I can test that myself.
by itself passing the locl opentype feature would do little. I assume raqm requires sufficient information to identify required opentype script and opentype language system to use, falling back to DFLT.dflt in the absence of any other info.
Any of the pan-CJK fonts that support language specific variant Han ideographs would be good tests.
Below some specific suggestions, I've included the bcp47 language tags and a list of script/language systems specifically supported by each font.
Gentium Plus: https://software.sil.org/downloads/r/gentium/GentiumPlus-6.101.zip
String1: Ấấ Ầầ Ẩẩ Ẫẫ Ắắ Ằằ Ẳẳ Ẵẵ Ếế Ềề Ểể Ễễ Ốố Ồồ Ổổ Ỗỗ
Diacritic stacking will change for Vietnamese language system.
lang="vi"
String2: б г д п т ѓ
In Italic typeface, Serbian and Macedonian will use alternative glyphs.
lang="sr" or lang="mk"
> otfinfo -s Gentium_Plus_Regular.ttf
DFLT Default
cyrl Cyrillic
cyrl.MKD Cyrillic/Macedonian
cyrl.SRB Cyrillic/Serbian
grek Greek
latn Latin
latn.IPPH Latin/Phonetic transcription—IPA conventions
latn.VIT Latin/Vietnamese
>
> otfinfo -s Gentium_Plus_Italic.ttf
DFLT Default
cyrl Cyrillic
cyrl.MKD Cyrillic/Macedonian
cyrl.SRB Cyrillic/Serbian
grek Greek
latn Latin
latn.IPPH Latin/Phonetic transcription—IPA conventions
latn.VIT Latin/Vietnamese
Scheherazade New: https://software.sil.org/downloads/r/scheherazade/ScheherazadeNew-3.300.zip
String3: ه ههه
Alternative glyphs used for Kurdish
lang='ku'
String4: م ممم ۶ ۷ بِّ
Alternative glyphs used for Sindhi
lang='sd'
> otfinfo -s Scheherazade_New_Regular.ttf
arab Arabic
arab.KIR Arabic/Kirghiz
arab.KUR Arabic/Kurdish
arab.RHG Arabic/<unknown language>
arab.SND Arabic/Sindhi
arab.URD Arabic/Urdu
arab.WLF Arabic/Wolof
latn Latin
Padauk: https://software.sil.org/downloads/r/padauk/Padauk-5.000.zip
String5: က︀ ၵ︀ ꩡ︀ ယ︀ လ︀ ၸ︀ ၺ ꩺ
Alternative glyphs used for Tai Aiton and Tai Phake
lang="aio" or lang="phk"
String6: ကှ ကှု ကှူ ကွ ကျွ ကြွ ကွှ
Alternative glyphs used for Kayah
lang="kyu"
String7: တွ တျွ တြွ တွှ
Alternative glypgs used for Shan
lang="shn"
N.B Padauk supports both mymr and mym2 opentype script tags
> otfinfo -s Padauk_Regular.ttf
DFLT Default
DFLT.CSH Default/<unknown language>
DFLT.KHN Default/<unknown language>
DFLT.KHT Default/<unknown language>
DFLT.KSW Default/<unknown language>
DFLT.KYU Default/<unknown language>
DFLT.SHN Default/Shan
mym2 <unknown script>
mym2.CSH <unknown script>/<unknown language>
mym2.KHN <unknown script>/<unknown language>
mym2.KHT <unknown script>/<unknown language>
mym2.KSW <unknown script>/<unknown language>
mym2.KYU <unknown script>/<unknown language>
mym2.SHN <unknown script>/Shan
mymr Myanmar
mymr.CSH Myanmar/<unknown language>
mymr.KHN Myanmar/<unknown language>
mymr.KHT Myanmar/<unknown language>
mymr.KSW Myanmar/<unknown language>
mymr.KYU Myanmar/<unknown language>
mymr.SHN Myanmar/Shan
mymr.dlft Myanmar/<unknown language>
One of the benefits of supporting locl is that not all variations supported by a language system are exposed as features in a font. And it simplifies python devs often needing to know their way around the guts of opentype features of each font.
Raqm has raqm_set_language()
.
Thank you for the reference. This feature is not available right now in mplcairo, but I would probably accept a PR adding support for it using an extension of the opentype feature syntax, i.e. font=Path("/path/to/font.ttf|frac,onum,locl=vi,...")
(AFAICS this syntax has the advantage of also allowing one to set the feature on a certain character subrange, without introducing a new API -- does this seem reasonable to you? Do you foresee problems with this approach?)
I’d not overload the locl
feature tag, something like language=XXXX
would be better (the language can affect any feature not just locl
).
Sure, that seems reasonable too.
I think the following patch is sufficient (it does work locally for me on the Gentium cyrillic example), can you confirm?
diff --git i/src/_raqm.h w/src/_raqm.h
index 471d3af..adf00cf 100644
--- i/src/_raqm.h
+++ w/src/_raqm.h
@@ -13,6 +13,7 @@ extern "C" { // Support raqm<=0.2.
_(get_glyphs) \
_(layout) \
_(set_freetype_face) \
+ _(set_language) \
_(set_text_utf8) \
_(version_string) \
_(version_atleast)
diff --git i/src/_util.cpp w/src/_util.cpp
index e5cbecd..1d43a25 100644
--- i/src/_util.cpp
+++ w/src/_util.cpp
@@ -797,7 +797,13 @@ GlyphsAndClusters text_to_glyphs_and_clusters(cairo_t* cr, std::string s)
*static_cast<std::vector<std::string>*>(
cairo_font_face_get_user_data(
cairo_get_font_face(cr), &detail::FEATURES_KEY))) {
- TRUE_CHECK(raqm::add_font_feature, rq, feature.c_str(), -1);
+ auto lang_tag = "language="s;
+ if (feature.substr(0, lang_tag.size()) == lang_tag) {
+ TRUE_CHECK(raqm::set_language,
+ rq, feature.c_str() + lang_tag.size(), 0, s.size());
+ } else {
+ TRUE_CHECK(raqm::add_font_feature, rq, feature.c_str(), -1);
+ }
}
TRUE_CHECK(raqm::layout, rq);
auto num_glyphs = size_t{};
Supporting setting different languages over a single string (perhaps reusing a indexing syntax like https://harfbuzz.github.io/harfbuzz-hb-common.html#hb-feature-from-string) would be left as an exercise to the reader...
(@khaledhosny Would it be safe for the tag to be named "lang" instead of "language"? (i.e. will there ever be a font feature which is actually called "lang"?) This would perhaps allow "abusing" hb_feature_from_string to support indexing syntax here.)
Would it be safe for the tag to be named "lang" instead of "language"
Feature tags can be any four bytes, so nothing prevents a font from having a lang
feature, and one can never know what features would be registered in the future.
OK, I'll stick to the patch above for now (if either you or @andjc can confirm that it works) and defer slicing syntax to another time, then.
The above patch is now in master. Leaving open as we may consider implementing slicing later.
Also pushed support for slicing. Thus closing, but feel free to ping for reopen in case I missed anything.