rosettatype/hyperglot

`hyperglot` inserts Japanese in the list of languages with Latin script. The fonts with old Greek hieroglyphs are not detected.

gusbemacbe opened this issue · 3 comments

Good morning!

I was building a Python script that gets all my favourite languages from my YAML file and checks if one font supports these specific languages, then generates a Markdown file with a list of supported languages.

The font “Aussan” does not have Japanese script.

It checked that Japanese is in the list of languages with Latin script and inserted it in the list.

I also tested another font with Japanese script (“Unifont”) and it didn't detect the Japanese language.

  • languages.yml:

    - "por": 'Alemão'
      "eng": 'German'
      "iso": 'deu'
    - "por": 'Grego'
      "eng": 'Greek'
      "iso": 'ell'
    - "por": 'Inglês'
      "eng": 'English'
      "iso": 'eng'
    - "por": 'Português'
      "eng": 'Portuguese'
      "iso": 'por'
    - "por": 'Japonês'
      "eng": 'Japanese'
      "iso": 'jpn'
  • generate-language-support-list.py:

    #!/usr/bin/env python
    
    import json
    import os
    import yaml
    
    from hyperglot.parse import parse_font_chars
    from hyperglot.languages import Languages
    
    languages = yaml.load(open('languages.yml'), Loader=yaml.FullLoader)
    
    # Font name
    font = "Código aberto – Arsenal – Assuan.ttf"
    name = "Arsenal – Assuan"
    
    # Finding all folders until finding the font file name "unifont.ttf"
    def find_font_file(path):
        for root, dirs, files in os.walk(path):
            for file in files:
                if file == font:
                    return os.path.join(root, file)
    
    def test_languages():
        # Path to the font file
        font_file = find_font_file('.')
    
        # Parsing the font file
        chars = parse_font_chars(font_file)
    
        Langs = Languages()
        supported = Langs.supported(chars)
    
        # Writing a Markdown file
        with open('{}.md'.format(name), 'w') as f:
            f.write('#### {}\n\n' .format(name))
    
            f.write('* Idiomas com alfabeto latino:\n')
            for lang in languages:
              if lang["iso"] in supported["Latin"]:
                f.write("\t* {}\n".format(lang["por"]))
    
          # If the font does not have Greek script
          if "Greek" not in supported:
            f.write("")
          else:
            f.write('\n* Idiomas com alfabeto grego:\n')
            for lang in languages:
              if lang["iso"] in supported["Greek"]:
                f.write("\t* {}\n".format(lang["por"]))
    
          # If the font does not have Japanese script
          if "Japanese" not in supported:
            f.write("")
          else:
            f.write('\n* Idiomas com ideogramas japoneses:\n')
            for lang in languages:
              if lang["iso"] in supported["Japanese"]:
                f.write("\t* {}\n".format(lang["por"]))
    
    if __name__ == '__main__':
        test_languages()

Output:

#### Arsenal – Assuan

* Idiomas com alfabeto latino:
	* Alemão
	* Inglês
	* Português
	* Japonês

The fonts by George Douros have old Greek hieroglyphs, as Linear A and B, and hyperglot did not detect that these fonts have these scripts.

I solved this issue and open the pull request, adding support for the new languages that hyperglot could not detect, in reference to #110 .

  • I improved the following code:

    #!/usr/bin/env python
    import os
    import yaml
    
    from hyperglot.parse import parse_font_chars
    from hyperglot.language import Language, Orthography
    from hyperglot.languages import Languages
    
    # Loading the languages.yml file
    languages = yaml.load(open('scripts/yaml/languages.yml'), Loader=yaml.FullLoader)
    
    # Font name
    font_name = "Fontes monoespaçadas – Código aberto – Unifont"
    # Finding all folders until finding the font file name
    def find_font_file(path):
        for root, dirs, files in os.walk(path):
            for file in files:
                if file.endswith(".ttf") or file.endswith(".otf") or file.endswith(".woff"):
                  if font_name in file:
                    return os.path.join(root, file)
    def test_languages():
        # Path to the font file
        font_file = find_font_file('.')
    
        # Parsing the font file
        chars = parse_font_chars(font_file)
    
        Langs = Languages()
        supported = Langs.supported(chars, includeAllOrthographies=True,
                      includeHistorical=True,
                      includeConstructed=True)
    
        # Writing a Markdown file
        with open('Apoio linguístico por fonte/{}.md'.format(font_name), 'w') as f:
            f.write('#### {}\n\n' .format(font_name))
    
            f.write('* Idiomas com alfabeto latino:\n')
            for lang in languages:
              if lang["iso"] in supported["Latin"]:
                if lang["iso"] == "jpn":
                  f.write("\t* Japonês (*romaji*)\n")
                else:
                  f.write("\t* {}\n" .format(lang["por"]))
    
            # If the font does not have Greek script
            if "Greek" not in supported:
              f.write("")
            else:
              f.write('\n* Idiomas com alfabeto grego:\n')
              for lang in languages:
                if lang["iso"] in supported["Greek"]:
                  f.write("\t* {}\n".format(lang["por"]))
    
            # If the font does not have Japanese script
            if "Kanji" not in supported or "Hiragana" not in supported or "Katakana" not in supported:
              f.write("")
            else:
              f.write('\n* Idiomas com sílabas japoneses:\n')
              for lang in languages:
                if lang["iso"] in supported["Kanji"] or lang["iso"] in supported["Hiragana"] or lang["iso"] in supported["Katakana"]:
                  f.write("\t* {}\n".format(lang["por"]))
    
    if __name__ == '__main__':
        test_languages()

It checked that Japanese is in the list of languages with Latin script and inserted it in the list.

I also tested another font with Japanese script (“Unifont”) and it didn't detect the Japanese language.

For the Unifont included in your zip file for testing from the PR I cannot confirm this for the CLI. Perhaps your script uses different defaults for the language support detection? hyperglot path/to/Unifont.ttf does list Japanese for Latin, Katagana, Hiragana, and Kanji.


The font “Aussan” does not have Japanese script.

Are you saying the Aussan font does not have Japanese support detected when it should, or that is has support detected but shouldn't? Could you post the output of using parse_font_chars on that file?