The API needs to operate in terms of Unicode code points or scalars, not code units
alexrp opened this issue ยท 7 comments
Hi,
As far as I can tell, the internals of this library all use uint
which is what one would expect since the tables contain code point values. However, the public UnicodeCalculator
API only speaks char
. This is a bit of a problem since a .NET char
is a UTF-16 code unit, not a Unicode code point / scalar. Many code points that are contained in the tables here can't actually be looked up, severely limiting the usefulness (and correctness) of the library. (This is also out of line with other language ports of this library, including the Python one linked in the README.)
.NET Core 3.0 introduced the System.Text.Rune
type (along with str.EnumerateRunes()
) which would probably be ideal for the API to work with since an instance of Rune
is guaranteed to be a valid Unicode scalar (i.e. a valid Unicode code point that isn't a surrogate). But since taking a dependency on that type is probably undesirable for this library due to targeting .NET Standard 2.0, simply accepting uint
would work too.
Yes, I'm aware of this, but operating in code units was good enough for the problem I was solving. To be really usable it would need to use something like Rune like you mention.
If this is something you would need I'm happy to add it.
I'm working on a project that requires fairly accurate measuring of arbitrary input text, so this would indeed be very useful. I was looking around to see if anyone had done a C# port of wcwidth
already and came across this project.
I'm also happy to put together a pull request if you happen to be busy with other things. ๐
@alexrp Absolutely, a pull request is more than welcome!
You could add support for netcoreapp31 and net50, and perhaps adding a preprocess-directive if you want to support Rune
in the library.
@alexrp Btw, gotten a lot of inspiration from your System.Terminal project for a new thing I'm building. Great work!
Admittedly there are some aspects of System.Terminal
that I've been wanting to overhaul and I've kinda been putting that off for a while, but I'm happy to hear you found it inspirational. ๐
Can you share more about the thing you're working on? I'm a bit of a terminal app enthusiast so I love to hear about things people are building in this area.
@alexrp I'm currently working on a project called Spectre.Console, and as part of that we're doing a lot of manipulation using ANSI/VT codes. As part of that project I want a bare bones terminal abstraction with a ANSI/VT emulation layer on Windows for "legacy" consoles that doesn't support ENABLE_VIRTUAL_TERMINAL_PROCESSING
.
I just tweeted about it here: https://twitter.com/firstdrafthell/status/1398951276113711108?s=20
Aha, that sounds fun and challenging. The Windows console APIs have a lot of weird quirks and limitations so I'll be interested to see where that goes. ๐ It would be fantastic for libraries like ours to work on older Windows versions though.