JohnEarnest/Decker

unicode support

farvardin opened this issue ยท 6 comments

Neither decker nor lil seems to support unicode. Accented letters are converted into ??.

Is it planned for later, or too complicated to include (which I can understand)?

I've tried to use French words with lil...

This is an intentional limitation.

Unicode, in its entirety, is staggering in scope. While supporting some subset of Unicode, someday, is not impossible, it would introduce a daunting amount of complexity. For pixel-perfect cross-platform results, Decker uses custom text layout routines and its own collection of bitmapped fonts, which would both need to be extended to include new glyphs for most non-English languages, and I'd also need to adapt the touch keyboard to permit entering those new glyphs.

Supporting UTF-8 in Lil itself isn't as bad, but it would introduce impedance mismatches between Lil and Decker. For example, it would no longer necessarily be the case that a Lil string or a Lil identifier is displayable in Decker. Lil can presently manipulate UTF-8 data as raw bytes using the "array" interface, but in many ways this is less convenient than working with ordinary strings.

For displaying text in Decker, some workarounds are possible; Decks can include custom fonts, and "rich text" fields can include inline bitmaps and reference different fonts for a given span, but copying and pasting "blended" text would reveal the underlying ASCII. There is some precedent for "normalizing" Unicode characters to ASCII equivalents on input (presently Decker "straightens" curly-quote characters, for example), but I doubt that applying this approach to letters would make users of languages with diacritics very happy.

sure, I understand, it quite beyond the scope of Decker. We'll deal with it :)

Thanks for your understanding. I do apologize for the accessibility limitations of this constraint.

What we don't want is Decker would become bloated, so I'm fine with this limitation!

@JohnEarnest would you accept a PR for this? I think UniFont might be a good fit for decker.

No. Including UniFont would increase the baseline size of web-decker by multiple megabytes for a single font, and as previously stated I do not wish to drag in all the text layout and string manipulation complications of Unicode.

If future versions of Decker support non US-ASCII characters, I'm much more inclined to pursue an equivalent of MacRoman's extended code page and convert to and from UTF-8 at the periphery of the system; it would be possible to support French, German, Spanish, Portuguese, and a handful of other major western languages with relatively sparing changes to text layout and string representations.