NaN integers can be returned by `Char.toCode`

Question

NaN integers can be returned by `Char.toCode`

alch-emi opened this issue a year ago · 2 comments

It seems like (as of version 0.19.1) it's possible to construct a NaN : Int through the expression

Char.fromCode 0xd800 |> Char.toCode

Doing some research, this seems unintended (and I think NaN ints are unintended in general?), as the expected return value would seem to be 0xFFFD (aka �), which is what you get when you feed most other invalid unicode codepoints to this expression.

~~I did a cursory search of other issues in this repo and it doesn't seem like anyone else has opened an issue for this, but please excuse me if I have missed something.~~ oh i just read the duplicates policy! that's lovely!!

Thank you for your time!

OS: NixOS 24.11 on Linux 6.6.63

Occurs in the REPL and Firefox 133.0

Answer 1 · 2024-12-11T01:40:37.000Z

Thanks for reporting this! To set expectations:

Issues are reviewed in batches, so it can take some time to get a response.
Ask questions a community forum. You will get an answer quicker that way!
If you experience something similar, open a new issue. We like duplicates.

Finally, please be patient with the core team. They are trying their best with limited resources.

Answer 2 · 2024-12-11T08:47:46.000Z

We found this behavior a while ago and wrote a small explainer
https://github.com/stil4m/elm-syntax/blob/master/src/Char/Extra.elm#L370-L384

Funnily enough, without some form of this behavior, the elm-syntax parser would be many times slower because there is no other way to check for UTF-16 surrogates currently.