evanbowman/BPCore-Engine

How can I print other utf char in this engine?

zxsean opened this issue · 20 comments

Hello, if I want to add new utf-8 char in this engine, how can I do it?

The char images will need to be added to images/charset.png. Then, the codepoints need to be mapped to 8x8 tile indices in https://github.com/evanbowman/BPCore-Engine/blob/master/source/localization.cpp.

Which chars do you want to add?

Because there are many Chinese characters and most of them are commonly used characters. Generally speaking, at least 2500 characters are required in the Chinese ROM. It will be more troublesome to add all of them manually. Have you considered adding similar conversion tools?

I would be happy to make the code changes to the engine myself, as long as you don't need thousands of new glyphs :). If it's a lot of new chars I'll need a bit of help.

Oh I see, that's a lot.

I have a little idea, is it possible to directly convert the font to a mapping? There is a great Chinese font in the link, which is specifically for pixel games. You can take a look. https://zhuanlan.zhihu.com/p/142899865

I understand how tedious it would be to add them manually, I helped someone translate one of my other projects to Chinese, and it was very time-consuming https://github.com/evanbowman/blind-jump-portable/blob/master/source/localization.cpp#L350

I don't know much about conversion tools, how do they work?

I haven't actually made a font tool. I think the way in this part of your code is to convert the char to the corresponding pixel matrix, just like drawing a character. Can you use these two tools to do the conversion? https://github.com/itouhiro/bdf2bmp https://github.com/hmgle/dot_matrix_font_to_bmp

Yeah, maybe the best way would be to write a little script that appends char images to the charset and generates c++ code for mapping them.

So my idea would be, basically:

Inputs to script:

  1. font file
  2. charset.png
  3. unicode chars

Outputs from script:

  1. charset.png with char images appended
  2. C++ code that can be pasted into the engine: (pseudocode)
    case utf8_char("一"):
    return 135; // mapping into file
    case utf8_char("二"):
    return 136;
    // etc.

Good idea. But will there be a problem of too large charset due to too much Chinese?

I think that the engine will handle a large charset without too much trouble. But in terms of actually displaying the chars, the engine can currently only display 80 unique chars onscreen at once, due to limited video ram. I reserved 80 tile slots in vram, for dynamically loading char images as they're needed. Chars need to share vram with other game textures, so I had to decide how much vram to dedicate to displaying text. Currently there's a limit of 80 chars at a time, but the limit could perhaps be increased.

Thx!!

Seems to be trivially easy to extract glyph images with python:

from PIL import Image,ImageDraw,ImageFont

# sample text and font
font = ImageFont.truetype("/home/evan/Downloads/DinkieBitmap-7pxDemo.ttf", 8, encoding="unic")

unicode_text = u"你好,世界"
# get the line size
text_width, text_height = font.getsize(unicode_text)

# create a blank canvas with extra space between lines
canvas = Image.new('RGB', (text_width, text_height), "white")

# draw the text onto the text canvas, and use black as the text color
draw = ImageDraw.Draw(canvas)
draw.text((0, 0), unicode_text, 'black', font)

# save the blank canvas to a file
canvas.save("test.png", "PNG")

Output:
test

I'm busy with lots of stuff but I think I'll have time to write a script sometime this week, maybe I'll do it today. I'll let you know when I get it working!

Thx! Best regards.

image

Demo

from PIL import Image,ImageDraw,ImageFont
import sys



def load_font(path):
    return ImageFont.truetype(path, 8, encoding="unic")



def get_unique_glyphs(text):
    unique_set = list(set(text))

    result = ""

    for c in unique_set:
        result += c

    return result



def get_concat_h(im1, im2):
    dst = Image.new('RGBA', (im1.width + im2.width, im1.height))
    dst.paste(im1, (0, 0))
    dst.paste(im2, (im1.width, 0))
    return dst



if __name__ == "__main__":
    if len(sys.argv) != 4:
        print("usage: inject_glyphs.py <font_path> <original_charset_image_path> <file_of_chars>")

    font = load_font(sys.argv[1])

    with open(sys.argv[3], "r") as chars_file:
        unicode_text = get_unique_glyphs(chars_file.read().replace('\n', '').replace('\r', ''))

    text_width, text_height = font.getsize(unicode_text)

    offset = 0
    if text_height > 8:
        offset = text_height - 8

    canvas = Image.new('RGB', (text_width, 8), "#000010")

    draw = ImageDraw.Draw(canvas)

    charset = Image.open(sys.argv[2])
    w, h = charset.size;
    start_index = w / 8

    with open("mappings.cpp", "w") as mapping_file:
        for i in range(0, len(unicode_text)):
            mapping_file.write("case UTF8_GETCHR(u8\"%s\"): return %d;\n" % (unicode_text[i],
                                                                             start_index + i))
            draw.text((i * 8, -offset), unicode_text[i], "#cdc3eb", font)

    canvas.save("charset.png", "PNG")

    concat_l = Image.open("charset.png")
    get_concat_h(charset, concat_l).save("charset.png")

input: chars.txt

你好,世界

command:
python3 inject_glyphs.py /home/evan/Downloads/DinkieBitmap-7pxDemo.ttf /home/evan/bpcore/images/charset.png chars.txt

output: image
charset
output: mappings

// mappings.cpp
case UTF8_GETCHR(u8"好"): return 187;
case UTF8_GETCHR(u8","): return 188;
case UTF8_GETCHR(u8"界"): return 189;
case UTF8_GETCHR(u8"你"): return 190;
case UTF8_GETCHR(u8"世"): return 191;

I was also working on a little while ago, seems to work alright. The engine code needs to be adjusted a bit so that these case statements can be copied into the localization file, but it's almost done I think.

I have other stuff to do today, so I probably won't get a chance to update the engine code until tomorrow.

Cool! Thx so much.

I added 2500 of the most common Chinese words to the Unicode engine (https://github.com/evanbowman/BPCore-Engine/releases/tag/21.9.10). Due to a C++ program linker bug resulting from the huge charset image, I ended up needing to break the charset image into multiple files. Therefore, I have not yet published the script for generating char mappings, because the script needs to be rewritten to correctly output multiple image files.

I chose which chars to add to the engine by filtering this list https://github.com/ruddfawcett/hanziDB.csv/blob/master/data/hanziDB.csv based on which characters in the dataset were available were available in the DinkieBitmap-7px font table. I'd be happy to add more chars to the engine, just send me a list of any additional ones that you'd like me to add.

exmaple