UTF-8
Opened this issue ยท 0 comments
bingoohuang commented
coding rules
1st Byte | 2nd Byte | 3rd Byte | 4th Byte | Number of Free Bits | Maximum Expressible Unicode Value |
---|---|---|---|---|---|
0xxxxxxx | 7 | 007F hex (127) | |||
110xxxxx | 10xxxxxx | (5+6)=11 | 07FF hex (2047) | ||
1110xxxx | 10xxxxxx | 10xxxxxx | (4+6+6)=16 | FFFF hex (65535) | |
11110xxx | 10xxxxxx | 10xxxxxx | 10xxxxxx | (3+6+6+6)=21 | 10FFFF hex (1,114,111) |
Bear plus snowflake equals polar bear
https://andysalerno.com/posts/weird-emojis/#
๐ฉ๐พ + โค + ๐ + ๐ฉ๐ป =
๐ป (bear; U+1F43B)
+ โ (snowflake; U+2744)
= ๏ธ๏ธ(polar bear; U+1F43B U+200D U+2744 U+FE0F)
So, as we have learned, a Unicode character can be made of multiple bytes, but it can also be made of multiple other Unicode characters. And they can be quite large โ 35 bytes, in the earlier example.
package main
import (
"fmt"
"reflect"
)
func main() {
fmt.Println("๐ is this many runes:", fmt.Sprintf("%08b", '๐'), "printed as strings:", runesAsStrings([]rune("๐")))
fmt.Println("๐ฉ๐พโโค๏ธโ๐โ๐ฉ๐ป is this many runes:", []rune("๐ฉ๐พโโค๏ธโ๐โ๐ฉ๐ป"), "printed as strings:", runesAsStrings([]rune("๐ฉ๐พโโค๏ธโ๐โ๐ฉ๐ป")))
fmt.Println("๐ฉ๐ฟ is this many runes:", []rune("๐ฉ๐ฟ"), "printed as strings:", runesAsStrings([]rune("๐ฉ๐ฟ")))
fmt.Println("๐ฉโ๐๏ธ is this many runes:", []rune("๐ฉโ๐๏ธ"), "printed as strings:", runesAsStrings([]rune("๐ฉโ๐๏ธ")))
fmt.Println("๐ฉ๐พโโค๏ธโ๐โ๐ฉ๐ป is this many runes:", []rune("๐ฉ๐พโโค๏ธโ๐โ๐ฉ๐ป"), "printed as strings:", runesAsStrings([]rune("๐ฉ๐พโโค๏ธโ๐โ๐ฉ๐ป")))
// Creating a rune
rune1 := 'B'
rune2 := 'g'
rune3 := '\a'
// Displaying rune and its type
fmt.Printf("Rune 1: %c; %08b Unicode: %U; Type: %s\n", rune1, rune1, rune1, reflect.TypeOf(rune1))
fmt.Printf("Rune 2: %c; %08b Unicode: %U; Type: %s\n", rune2, rune2, rune2, reflect.TypeOf(rune2))
fmt.Printf("Rune 3: %c; %08b Unicode: %U; Type: %s\n", rune3, rune3, rune3, reflect.TypeOf(rune3))
}
func runesAsStrings(runes []rune) (s string) {
for _, r := range runes {
s += string(r)
}
return
}
That's why it's called a rune (a code point), and not a grapheme cluster ;)
่ฟๅฐฑๆฏไธบไปไนๅฎ่ขซ็งฐไธบ็ฌฆๆ(ไธไธชไปฃ็ ็น) ๏ผ่ไธๆฏๅญ็ด ้็พค;)
https://www.reddit.com/r/golang/comments/o1o5hr/fyi_a_single_go_rune_is_not_the_same_as_a_single
- String length is not always rune length ๅญ็ฌฆไธฒ้ฟๅบฆๅนถไธๆปๆฏ็ฌฆๆ้ฟๅบฆ
- rune count is not always rune width (monospace font) ็ฌฆๆ่ฎกๆฐๅนถไธๆปๆฏ็ฌฆๆๅฎฝๅบฆ(ๅ็ฉบ้ดๅญไฝ)
- Unicode is hard Unicode ๅพ้พ