`hyperglot` inserts Japanese in the list of languages with Latin script. The fonts with old Greek hieroglyphs are not detected.

Question

`hyperglot` inserts Japanese in the list of languages with Latin script. The fonts with old Greek hieroglyphs are not detected.

gusbemacbe opened this issue 2 years ago · 3 comments

Good morning!

I was building a Python script that gets all my favourite languages from my YAML file and checks if one font supports these specific languages, then generates a Markdown file with a list of supported languages.

The font “Aussan” does not have Japanese script.

It checked that Japanese is in the list of languages with Latin script and inserted it in the list.

I also tested another font with Japanese script (“Unifont”) and it didn't detect the Japanese language.

languages.yml:

- "por": 'Alemão'
  "eng": 'German'
  "iso": 'deu'
- "por": 'Grego'
  "eng": 'Greek'
  "iso": 'ell'
- "por": 'Inglês'
  "eng": 'English'
  "iso": 'eng'
- "por": 'Português'
  "eng": 'Portuguese'
  "iso": 'por'
- "por": 'Japonês'
  "eng": 'Japanese'
  "iso": 'jpn'

generate-language-support-list.py:

#!/usr/bin/env python

import json
import os
import yaml

from hyperglot.parse import parse_font_chars
from hyperglot.languages import Languages

languages = yaml.load(open('languages.yml'), Loader=yaml.FullLoader)

# Font name
font = "Código aberto – Arsenal – Assuan.ttf"
name = "Arsenal – Assuan"

# Finding all folders until finding the font file name "unifont.ttf"
def find_font_file(path):
    for root, dirs, files in os.walk(path):
        for file in files:
            if file == font:
                return os.path.join(root, file)

def test_languages():
    # Path to the font file
    font_file = find_font_file('.')

    # Parsing the font file
    chars = parse_font_chars(font_file)

    Langs = Languages()
    supported = Langs.supported(chars)

    # Writing a Markdown file
    with open('{}.md'.format(name), 'w') as f:
        f.write('#### {}\n\n' .format(name))

        f.write('* Idiomas com alfabeto latino:\n')
        for lang in languages:
          if lang["iso"] in supported["Latin"]:
            f.write("\t* {}\n".format(lang["por"]))

      # If the font does not have Greek script
      if "Greek" not in supported:
        f.write("")
      else:
        f.write('\n* Idiomas com alfabeto grego:\n')
        for lang in languages:
          if lang["iso"] in supported["Greek"]:
            f.write("\t* {}\n".format(lang["por"]))

      # If the font does not have Japanese script
      if "Japanese" not in supported:
        f.write("")
      else:
        f.write('\n* Idiomas com ideogramas japoneses:\n')
        for lang in languages:
          if lang["iso"] in supported["Japanese"]:
            f.write("\t* {}\n".format(lang["por"]))

if __name__ == '__main__':
    test_languages()

Output:

#### Arsenal – Assuan

* Idiomas com alfabeto latino:
	* Alemão
	* Inglês
	* Português
	* Japonês

Answer 1 · 2023-03-13T15:40:35.000Z

The fonts by George Douros have old Greek hieroglyphs, as Linear A and B, and hyperglot did not detect that these fonts have these scripts.

Answer 2 · 2023-03-15T03:08:33.000Z

I solved this issue and open the pull request, adding support for the new languages that hyperglot could not detect, in reference to #110 .

I improved the following code:

#!/usr/bin/env python
import os
import yaml

from hyperglot.parse import parse_font_chars
from hyperglot.language import Language, Orthography
from hyperglot.languages import Languages

# Loading the languages.yml file
languages = yaml.load(open('scripts/yaml/languages.yml'), Loader=yaml.FullLoader)

# Font name
font_name = "Fontes monoespaçadas – Código aberto – Unifont"
# Finding all folders until finding the font file name
def find_font_file(path):
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith(".ttf") or file.endswith(".otf") or file.endswith(".woff"):
              if font_name in file:
                return os.path.join(root, file)
def test_languages():
    # Path to the font file
    font_file = find_font_file('.')

    # Parsing the font file
    chars = parse_font_chars(font_file)

    Langs = Languages()
    supported = Langs.supported(chars, includeAllOrthographies=True,
                  includeHistorical=True,
                  includeConstructed=True)

    # Writing a Markdown file
    with open('Apoio linguístico por fonte/{}.md'.format(font_name), 'w') as f:
        f.write('#### {}\n\n' .format(font_name))

        f.write('* Idiomas com alfabeto latino:\n')
        for lang in languages:
          if lang["iso"] in supported["Latin"]:
            if lang["iso"] == "jpn":
              f.write("\t* Japonês (*romaji*)\n")
            else:
              f.write("\t* {}\n" .format(lang["por"]))

        # If the font does not have Greek script
        if "Greek" not in supported:
          f.write("")
        else:
          f.write('\n* Idiomas com alfabeto grego:\n')
          for lang in languages:
            if lang["iso"] in supported["Greek"]:
              f.write("\t* {}\n".format(lang["por"]))

        # If the font does not have Japanese script
        if "Kanji" not in supported or "Hiragana" not in supported or "Katakana" not in supported:
          f.write("")
        else:
          f.write('\n* Idiomas com sílabas japoneses:\n')
          for lang in languages:
            if lang["iso"] in supported["Kanji"] or lang["iso"] in supported["Hiragana"] or lang["iso"] in supported["Katakana"]:
              f.write("\t* {}\n".format(lang["por"]))

if __name__ == '__main__':
    test_languages()

Answer 3 · 2023-03-27T15:49:45.000Z

It checked that Japanese is in the list of languages with Latin script and inserted it in the list.

I also tested another font with Japanese script (“Unifont”) and it didn't detect the Japanese language.

For the Unifont included in your zip file for testing from the PR I cannot confirm this for the CLI. Perhaps your script uses different defaults for the language support detection? hyperglot path/to/Unifont.ttf does list Japanese for Latin, Katagana, Hiragana, and Kanji.

The font “Aussan” does not have Japanese script.

Are you saying the Aussan font does not have Japanese support detected when it should, or that is has support detected but shouldn't? Could you post the output of using parse_font_chars on that file?