Japanese Strings don't get colored correctly

Question

Japanese Strings don't get colored correctly

Opened this issue 4 years ago · 8 comments

As shown in the attached screenshot, strings containing Japanese characters don't get displayed/colored correctly. I switched the color to yellow temporarily but the content between the quotes remains black, only a single character gets colored correctly if the caret is placed upon it.

Mogens' initial thoughts on Slack were:

mogenslund: "If I see correctly, the japanese characters occupy two columns. Since I in some cases clear the rest of the line in the terminal, to clean up the calculation gets wrong with these characters. I am pretty sure its is a view thing and that it is handled correctly in the buffer behind the view. [...] I guess handling these characters will be somewhat similar to handling the tab character.

Answer 1 · 2020-10-29T19:30:37.000Z

Is there any way to determine if a character occupies one or two columns in the terminal?

Answer 2 · 2020-10-30T00:28:36.000Z

That's a very good question. There are 3 different types of characters in the Japanese language (hiragana, katakana and kanji) and as far as I have seen it on different systems until now they will all be displayed with the same width. So for example a single hiragana あ (A) has the same width as the katakana version ア and both of those have the same width as a kanji like e.g. 本 (book). And I'm really not sure about this part, but it looks to me that the space for one Japanese character takes up the same as two latin characters.

I was curious if other non-latin characters get displayed properly so I added some Russian and Danish (of which I know absolutely nothing 😅) to a test.txt and checked that in my terminal (XFCE) and Liquid. As you can see in the screenshots only the Japanese alternative doesn't get displayed.

Answer 3 · 2020-10-30T00:37:13.000Z

Actually there is also half-width katakana, though may be that is not used as often these days. See: https://en.wikipedia.org/wiki/Half-width_kana#Encoding for some details.

One might not normally think of the Latin alphabet as being part of the Japanese writing system, but it is used along with the other characters (though not really in the same way as the other characters) and there is a full-width version (meaning that the width of the character that's displayed is twice that of what you'd expect from ASCII being displayed).

FWIW, there is some info on half-width and full-width here: https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms

Answer 4 · 2020-10-30T00:41:34.000Z

Indeed, totally forgot about the half-width ones because I never use them 😅 Thank you for the addition @sogaiu

Answer 5 · 2020-10-30T01:05:16.000Z

Yeah, I think they tend to be the types of things that one wishes one's IME doesn't accidentally end up producing :)

Answer 6 · 2021-01-30T08:28:25.000Z

Hi @daisybytes @sogaiu

I have made an attempt to better display double width characters. I don't think it is a resolution of this issues, but a step on the way.
Look at 50d4f32
I have created a small function to determine if a character is double width. It is not very precise and tries to match on character intervals. Improving on double width charaters might boil down to improving this function.
If it works, and you have some better intervals to use, please just write me.
The first condition in the function sort of opts out early on most used characters.
Words like こんにちは, seems to be displayed better (Some example i grapped for testing purposes)

Answer 7 · 2021-02-01T01:54:33.000Z

Hi Mogens,

I can confirm that the Japanese strings now get colored correctly 👍 Since this is an issue which most likely won't affect that many people I think you can close it and maybe keep an eye out with very low priority for future releases.

Thanks for your time!

Answer 8 · 2021-10-24T02:39:34.000Z

This should be alright for now 😅