indexing in strings with special charachters dosen't work proparly
Hjagu09 opened this issue · 2 comments
Hjagu09 commented
example
print "å"[0]
print string_to_unicode("å")
output
195
[229]
expected output
229
[229]
testing with more characters gives me this:
- 195 for åäöøæ
- 266 for ←↓↑→
- 194 for ¹²³ª
the same things happen with for loops
aardappel commented
I'm afraid this does work properly, as indexing works by byte, not by unicode character.
Strings use a UTF-8 representation, so O(1) indexing would not be possible.
This is exactly the reason we have string_to_unicode
: to turn it into a vector, which is indexable by unicode code point.
If you index a C++ std::string
, you'll get the same result. Much like C++, a Lobster string
does not promise its contents is UTF-8 (we use strings for abitrary binary buffers), only that if you store string data in it, it will be UTF-8.
Hjagu09 commented
Thank you, this issue can be closed