Non-ASCII characters are displayed incorrectly (code and font mismatch)

Question

Non-ASCII characters are displayed incorrectly (code and font mismatch)

Closed this issue 5 years ago · 6 comments

There is a mismatch between the contents of the font file (font5x8.bin) and the code that renders characters. If you try to display a string containing non-English letters you quickly see that you get incorrect glyphs displayed for them:

fb.text("Hallå, världen!") # Using the included font5x8.bin font
# Displays "Hallσ, vΣrlden!"

I think that it is reasonable to expect the text method to display the right glyph for all the characters that are supported by the font5x8.bin font file. Currently, some of them are incorrectly mapped.

Strings in CircuitPython (and Python 3) can store any Unicode character. The problem is how they get processed by adafruit_framebuf.py. It converts each character to an integer, and uses that as an index in the font containing 256 glyphs. Mapping Unicode code points U+0000-U+00FF to bytes 0x00-0xFF happens to be what the ISO 8859-1 encoding does. So, the code now implicitly encodes the text into ISO 8859-1.

I looked at the glyphs in the font5x8.bin file and figured out that it uses the CP-437 encoding. ISO 8859-1 and CP-437 both map the printable ASCII (or "English") range of characters to the same byte values, so this explains why this has not been reported before. ISO 8859-1 contains more letters, but CP347 contains some useful line and box drawing glyphs.

Strings in CircuitPython are neutral encoding-wise, but the bitmap fonts are not. The ideal solution would be to pair the font with its encoding, for example by adding a keyword argument to the text method: fb.text("Hallå, världen!", font_file="font5x8.bin", font_encoding="cp437") The default value of this new argument could be "cp437" to match the default file. The user should either use default values for both font_file and font_encoding or specify both.

I expected this to be simple to implement: simply do string.encode(font_encoding) to calculate the correct bytes (or font glyph indices) in the text method. In Python 3 this would work. However, the encode method is only partially implemented in CircuitPython. It only supports UTF-8 as the encoding. And even worse, it disregards the argument and always uses UTF-8. The encode method would need to be extended with more encodings (whose tables take up space, which can be a concern). Or a special CP437-encoding function could be added to the adafruit_framebuf library to support the included font.

Another solution is to change the contents of the included font5x8.bin file to use ISO 8859-1 instead. This will make text rendering just work for the range U+0000-U+00FF. This range happens to be enough for my native language (Swedish), so this is the workaround I used for my own project.

Answer 1 · 2019-08-24T17:54:10.000Z

hiya we're solved this in a different way - as of CircuitPython 5.0 you'll use displayio which can handle unicode fonts of any sort
in CPython you'd use PIL(low)

Answer 2 · 2019-08-24T18:11:54.000Z

@ladyada : Okay. Thanks for replying!

I understand that framebuf is deprecated, but I cannot get displayio to fit my Circuit Playground Express (even when removing frozen modules). I have something that works for me, so I only filed this issue to help others.

Answer 3 · 2019-08-24T18:13:21.000Z

if you'd like to add unicode font support, we're happy to take a look, i dont think it will fit on the express but it is worth a try! it sounds like your case is very very specialized and you've found why we're moving on from this method :)

Answer 4 · 2019-08-24T18:14:05.000Z

For anyone who finds it useful: this is a script to convert the font fron CP347 to ISO 8859-1:

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('infile')
parser.add_argument('outfile')
args = parser.parse_args()

glyphs = {}
empty_glyph = None
glyph_width = None
glyph_height = None

with open(args.infile, "rb") as f:
    glyph_width = f.read(1)[0]
    glyph_height = f.read(1)[0]
    assert glyph_height == 8
    empty_glyph = bytes(glyph_width * [0])
    print("glyph size: %dx%d" % (glyph_width, glyph_height))
    for i in range(256):
        character = bytes([i]).decode("cp437")
        columns = f.read(glyph_width)
        glyphs[character] = columns

with open(args.outfile, "wb") as f:
    f.write(bytes([glyph_width]))
    f.write(bytes([glyph_height]))
    for x in range(16):
        for y in range(16):
            i = x*16 + y
            character = bytes([i]).decode("latin1")
            columns = glyphs.get(character, empty_glyph)
            print("X" if character in glyphs else ".", end="")
            f.write(columns)
        print("")

Run it like this: python3 convert_font.py font5x8.bin font5x8_latin1.py

Answer 5 · 2019-08-25T17:35:38.000Z

thanks, want to commit that as a separate file? maybe useful for others! :)

Answer 6 · 2020-04-29T20:44:55.000Z

@raek If you're interested in submitting that file, please let me know. For now, I'm going to close this issue.