`hyperglot` inserts Japanese in the list of languages with Latin script. The fonts with old Greek hieroglyphs are not detected.
gusbemacbe opened this issue · 3 comments
Good morning!
I was building a Python script that gets all my favourite languages from my YAML file and checks if one font supports these specific languages, then generates a Markdown file with a list of supported languages.
The font “Aussan” does not have Japanese script.
It checked that Japanese is in the list of languages with Latin script and inserted it in the list.
I also tested another font with Japanese script (“Unifont”) and it didn't detect the Japanese language.
-
languages.yml
:- "por": 'Alemão' "eng": 'German' "iso": 'deu' - "por": 'Grego' "eng": 'Greek' "iso": 'ell' - "por": 'Inglês' "eng": 'English' "iso": 'eng' - "por": 'Português' "eng": 'Portuguese' "iso": 'por' - "por": 'Japonês' "eng": 'Japanese' "iso": 'jpn'
-
generate-language-support-list.py
:#!/usr/bin/env python import json import os import yaml from hyperglot.parse import parse_font_chars from hyperglot.languages import Languages languages = yaml.load(open('languages.yml'), Loader=yaml.FullLoader) # Font name font = "Código aberto – Arsenal – Assuan.ttf" name = "Arsenal – Assuan" # Finding all folders until finding the font file name "unifont.ttf" def find_font_file(path): for root, dirs, files in os.walk(path): for file in files: if file == font: return os.path.join(root, file) def test_languages(): # Path to the font file font_file = find_font_file('.') # Parsing the font file chars = parse_font_chars(font_file) Langs = Languages() supported = Langs.supported(chars) # Writing a Markdown file with open('{}.md'.format(name), 'w') as f: f.write('#### {}\n\n' .format(name)) f.write('* Idiomas com alfabeto latino:\n') for lang in languages: if lang["iso"] in supported["Latin"]: f.write("\t* {}\n".format(lang["por"])) # If the font does not have Greek script if "Greek" not in supported: f.write("") else: f.write('\n* Idiomas com alfabeto grego:\n') for lang in languages: if lang["iso"] in supported["Greek"]: f.write("\t* {}\n".format(lang["por"])) # If the font does not have Japanese script if "Japanese" not in supported: f.write("") else: f.write('\n* Idiomas com ideogramas japoneses:\n') for lang in languages: if lang["iso"] in supported["Japanese"]: f.write("\t* {}\n".format(lang["por"])) if __name__ == '__main__': test_languages()
Output:
#### Arsenal – Assuan
* Idiomas com alfabeto latino:
* Alemão
* Inglês
* Português
* Japonês
The fonts by George Douros have old Greek hieroglyphs, as Linear A and B, and hyperglot
did not detect that these fonts have these scripts.
I solved this issue and open the pull request, adding support for the new languages that hyperglot
could not detect, in reference to #110 .
-
I improved the following code:
#!/usr/bin/env python import os import yaml from hyperglot.parse import parse_font_chars from hyperglot.language import Language, Orthography from hyperglot.languages import Languages # Loading the languages.yml file languages = yaml.load(open('scripts/yaml/languages.yml'), Loader=yaml.FullLoader) # Font name font_name = "Fontes monoespaçadas – Código aberto – Unifont" # Finding all folders until finding the font file name def find_font_file(path): for root, dirs, files in os.walk(path): for file in files: if file.endswith(".ttf") or file.endswith(".otf") or file.endswith(".woff"): if font_name in file: return os.path.join(root, file) def test_languages(): # Path to the font file font_file = find_font_file('.') # Parsing the font file chars = parse_font_chars(font_file) Langs = Languages() supported = Langs.supported(chars, includeAllOrthographies=True, includeHistorical=True, includeConstructed=True) # Writing a Markdown file with open('Apoio linguístico por fonte/{}.md'.format(font_name), 'w') as f: f.write('#### {}\n\n' .format(font_name)) f.write('* Idiomas com alfabeto latino:\n') for lang in languages: if lang["iso"] in supported["Latin"]: if lang["iso"] == "jpn": f.write("\t* Japonês (*romaji*)\n") else: f.write("\t* {}\n" .format(lang["por"])) # If the font does not have Greek script if "Greek" not in supported: f.write("") else: f.write('\n* Idiomas com alfabeto grego:\n') for lang in languages: if lang["iso"] in supported["Greek"]: f.write("\t* {}\n".format(lang["por"])) # If the font does not have Japanese script if "Kanji" not in supported or "Hiragana" not in supported or "Katakana" not in supported: f.write("") else: f.write('\n* Idiomas com sílabas japoneses:\n') for lang in languages: if lang["iso"] in supported["Kanji"] or lang["iso"] in supported["Hiragana"] or lang["iso"] in supported["Katakana"]: f.write("\t* {}\n".format(lang["por"])) if __name__ == '__main__': test_languages()
It checked that Japanese is in the list of languages with Latin script and inserted it in the list.
I also tested another font with Japanese script (“Unifont”) and it didn't detect the Japanese language.
For the Unifont included in your zip file for testing from the PR I cannot confirm this for the CLI. Perhaps your script uses different defaults for the language support detection? hyperglot path/to/Unifont.ttf
does list Japanese for Latin, Katagana, Hiragana, and Kanji.
The font “Aussan” does not have Japanese script.
Are you saying the Aussan font does not have Japanese support detected when it should, or that is has support detected but shouldn't? Could you post the output of using parse_font_chars
on that file?