janlelis/unicode-display_width

Bold/Italic Unicode characters incorrect width

kalemi19 opened this issue Β· 3 comments

Example: 𝗕𝗼𝗹𝗱

Javascript count this as 8 characters (just like emojis, each bold character has the length 2).

Ruby counts this word as 4 characters, causing an inconsistency with the frontend.

I just tried it with this Gem, but Unicode::DisplayWidth.of("𝗕𝗼𝗹𝗱") still returns 4.

Is this a bug or is there something I need to do in order to make it work for my use case?

Thank you

Hi @kalemi19,

unfortunately, the Unicode standard does not provide a definite way how exact string width should be calculated. However, they do provide EastAsianWidth.txt which lists 𝗕 as a neutral/narrow letter.

Which method do you use to retrieve the character count in JavaScript?

The standard String.length() function.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length

P.S. Sorry for the late reply

The length return by JavaScript is the number of code units required to represent the data in UTF-16. You can use the unibits utility to get a lower-level view of the data:

𝗕                                       𝗼                                       
U+1D5D5                                 U+1D5FC                                 
35        D8        D5        DD        35        D8        FC        DD        
00110101  11011000  11010101  11011101  00110101  11011000  11111100  11011101  

𝗹                                       𝗱                                       
U+1D5F9                                 U+1D5F1                                 
35        D8        F9        DD        35        D8        F1        DD        
00110101  11011000  11111001  11011101  00110101  11011000  11110001  11011101  

Each code point (i.e. character) is made of 4 bytes which resemble the lower and the higher code unit in UTF-16 (also see https://en.wikipedia.org/wiki/UTF-16)

What this library (unicode-display_width) does is assigning a width to each code point, using each code point's EastAsianWidth as a one major factor (see https://www.unicode.org/reports/tr11/#Overview). As stated in the first comment, the bold characters have no full-width property defined, which is why they are counted as being just on 1 terminal space wide.