Support Unicode escape sequences in characters
Opened this issue · 1 comments
Characters do not currently accept unicode escape characters in the shape of \uHHHH
and \uHHHHHHHH
(where H
is any hexadecimal number).
This could be a good thing to support those. However, this may complicate typechecking a little bit. What is the size of a unicode character? How well would it integrate with compilation? Will there be any currently known problems with supporting Unicode characters?
Those questions need to be answered first. This will (or not) be worked on after that step.
Characters may also suffer from the same bug as #3. This will also need to be fixed if needed.
What is the size of a unicode character?
Languages supporting unicode characters out of the box use 32-bit integers to encode characters. This is a little wasteful in most scenarios, but this allows to store any Unicode character.
How well would it integrate with compilation?
An opaque builtin char
type should do the job pretty much. We can treat it as a u32
or s32
when compiling.
Will there be any currently known problems with supporting Unicode characters?
Some functions from the C standard library (e.g. strlen
etc) do not count in unicode codepoints, but rather offsets of 8 bits.
However, this is technically not the problem of N⋆.