purescript/purescript-strings

fromCharCode BMP

jamesdbrock opened this issue · 5 comments

fromCharCode should return Nothing if the code is out of the Basic Multilingual Plane Char range, right?

fromCharCode = toEnum

>>> show $ fromCharCode 65900

(Just 'Ŭ')

The Bounded instance for Char says that “Characters fall within the Unicode range,” but the Char says “guaranteed to contain one code unit.”

Oh interesting, it appears this is actually the line at fault:
https://github.com/purescript/purescript-enums/blob/170d959644eb99e0025f4ab2e38f5f132fd85fa4/src/Data/Enum.purs#L316-L318

It's using top and bottom for ints, not chars. I guess n >= toCharCode bottom && n <= toCharCode top might work?

String.fromCharCode just does (code) % 0x10000 on the code, so what you're seeing is 65900 % 0x10000 = 0x16C.

I've opened an issue in purescript-enum to track this. Should this issue be closed?

I think it’s reasonable it stays open until the upstream issue is addressed

Technically, we still need a release of that library and then a dependency update here.

PR ready for approval: #163.