DenverCoder1/table2ascii

Different languages cause layout changes.

Neillife opened this issue · 10 comments

There will be a layout offset problem when using different languages ​​for output.

Issues

The complete code is as follows.

from table2ascii import table2ascii as t2a
from table2ascii import PresetStyle
from table2ascii import Alignment

output = t2a(
    header=["日期", "test"],
    body=[["2022/12/11", "test"], ["2022/1/1", "測試"]],
    cell_padding=5,
    style=PresetStyle.double_thin_compact,
    alignments=[Alignment.CENTER] * 2
)

print(output)

Is there any solution?

That is caused by the font not having monospace versions of those characters. It is similar when dealing with emoji.

Unfortunately, there is nothing that the library can do about that since it isn't possible to know how wide the variable width characters will appear in the font it will be displayed in and even if it could know, it wouldn't be able to line it up perfectly.

The library places the correct number of spaces, so the best you can do is install a different font for it to use, or manually adjust the widths using obscure varying-width whitespace characters and make it so it is aligned when using that particular font.

See also #32

I use unicode in the string to judge different languages to deal with the problem of layout offset.

Add the same number of \u200b (Zero width space) string lengths in different languages.
Maybe not the best solution but it can solve the layout offset problem.

The complete code is as follows.

from table2ascii import table2ascii as t2a
from table2ascii import PresetStyle
from table2ascii import Alignment

def handle_layout_offset(list):
    unicode = ""
    if u'\u4e00' <= list <= u'\u9fa5':
        for i in list:
            unicode += u"\u200b"
    
    return unicode

headerList = ["日期", "test"]
bodyList = [["2022/12/11", "test"], ["2022/1/1", "測試"]]

for i in range(len(headerList)):
    headerList[i] += handle_layout_offset(headerList[i])    

for body in bodyList:
    for i in range(len(body)):
        body[i] += handle_layout_offset(body[i])

output = t2a(
    header=headerList,
    body=bodyList,
    cell_padding=5,
    style=PresetStyle.double_thin_compact,
    alignments=[Alignment.CENTER] * 2
)

print(output)

Output result:
ouput

That is interesting that it appears the characters are exactly double width.

I still think it will depend on the font and program used to render it, so if there is an internal fix, it should probably be an opt-in, toggle-able flag that would count potential double-width characters as 2 when determining the length.

The zero-width space solution does seem to be a good workaround for an external solution, though.

If you have a chance, let me know what you think of the proposed solution in #63

My current thinking is we can add the flag to toggle the feature, but default it to True, making it a major release (1.0.0) but also possible to revert to the old way.

Some relevant links that could help with this:

https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters

Python API - https://pypi.org/project/wcwidth/

Used to calculate the length of the unicode string, maybe the keywords I entered in the search engine were not precise enough.
wcwidth is exactly what I'm looking for in the python library.

If you have a chance, let me know what you think of the proposed solution in #63

I'm wondering if it can handle all languages as well as emoji, special symbols?
This would be the perfect solution if possible!
I'm looking forward to your next table2ascii release.

My current thinking is we can add the flag to toggle the feature, but default it to True, making it a major release (1.0.0) but also possible to revert to the old way.

The use_wcwidth flag is preset to true if it does not affect the output of the original version, I think it can be preset to true.

I'm wondering if it can handle all languages as well as emoji, special symbols?

If the font used for displaying the characters makes them exactly 0, 1, or 2 characters wide, it should also fix those. Otherwise, it may still be slightly off.

For the most part, it seems pretty good, at least in my terminal.

image

Sites such as GitHub and Discord seem to still not line up the Chinese characters exactly due to the font.

+----+----+----+----+----+
|  ​  | 🦁 | 🦡 | 🦅 | 🐍 |
+----+----+----+----+----+
| 💻 | ✅ | ✅ | ❌ | ❌ |
+----+----+----+----+----+
| 📅 | ✅ | ❌ | ✅ | ❌ |
+----+----+----+----+----+
| 🥞 | 日 | 月 | 火 | 水 |
+----+----+----+----+----+

If it does not affect the output of the original version, I think it can be preset to true.

Yeah, nearly all cases the output will be the same as it used to. Even with your zero width space workaround, it should still be fine since the zero width spaces will be counted as 0 when the other characters are counted as 2.

The output does change in some cases, although it seems like in nearly all cases, it makes it better.

Yeah, nearly all cases the output will be the same as it used to. Even with your zero width space workaround, it should still be fine since the zero width spaces will be counted as 0 when the other characters are counted as 2.

The output does change in some cases, although it seems like in nearly all cases, it makes it better.

Yeah, this really makes it even better!
I think the width of the table can be calculated more accurately.
Maybe this solution can reduce the exception that a lot of characters cause the layout to shift.

image

+----+----+----+----+----+
|  ​  | 🦁 | 🦡 | 🦅 | 🐍 |
+----+----+----+----+----+
| 💻 | ✅ | ✅ | ❌ | ❌ |
+----+----+----+----+----+
| 📅 | ✅ | ❌ | ✅ | ❌ |
+----+----+----+----+----+
| 🥞 | 日 | 月 | 火 | 水 |
+----+----+----+----+----+

It looks like Terminal likes this scheme, but sites like GitHub and Discord don't. 😮

This feature is now released in version 1.0.1