Support Unicode escape sequences in characters

Question

Support Unicode escape sequences in characters

Opened this issue 4 years ago · 1 comments

Characters do not currently accept unicode escape characters in the shape of \uHHHH and \uHHHHHHHH (where H is any hexadecimal number).

This could be a good thing to support those. However, this may complicate typechecking a little bit. What is the size of a unicode character? How well would it integrate with compilation? Will there be any currently known problems with supporting Unicode characters?

Those questions need to be answered first. This will (or not) be worked on after that step.

Characters may also suffer from the same bug as #3. This will also need to be fixed if needed.

Answer 1 · 2022-07-13T08:20:59.000Z

What is the size of a unicode character?

Languages supporting unicode characters out of the box use 32-bit integers to encode characters. This is a little wasteful in most scenarios, but this allows to store any Unicode character.

How well would it integrate with compilation?

An opaque builtin char type should do the job pretty much. We can treat it as a u32 or s32 when compiling.

Will there be any currently known problems with supporting Unicode characters?

Some functions from the C standard library (e.g. strlen etc) do not count in unicode codepoints, but rather offsets of 8 bits.
However, this is technically not the problem of N⋆.