Dhghomon/rust-fsharp

Rust chars are not UTF-8

Closed this issue · 1 comments

Rust char in F# is a char (.NET Char). Rust char is UTF-8, while in F# they are UTF-16.

Rust char is UTF-32 (that's not specified, although it is specified that it should be 4 bytes wide):

Representation

char is always four bytes in size. This is a different representation than a given character would have as part of a String...

https://doc.rust-lang.org/std/primitive.char.html#representation

OTOH, Rust strings (String and str) are UTF-8 encoded, and actually represented with Vec<u8> (https://doc.rust-lang.org/src/alloc/string.rs.html#279-281).

Oh yeah, silly me. I do have a note on String being Vec<u8> though.