nimble install unicody
Unicody is an alternative to Nim's std/unicode module that is entirely focused on UTF-8.
Why create an alternative? Two primary motivating reasons:
- Currently, std/unicode handles invalid UTF-8 incorrectly. See 1 and 2.
- Working with UTF-8 for a web server has lead me to wanting different behavior and additional procs.
I created unicody
so these changes and additions have a place to live and so all of my own projects can make use of this module.
Unicody is written entirely in Nim so no external linking, deps, compiler flags, or whatever is necessary.
A goal for Unicody is to be a drop-in replacement for std/unicode. To enable this, Unicody matches proc signatures where alternative implementations have been written.
Currently Unicody does not have implementations for most procs in std/unicode so it is not correct to say that Unicody is a complete drop-in replacement for everything.
While Unicody does not have every proc covered, a core set of procs are implemented that enable quite a lot of use-cases.
Unicody does not currently have an implementation of most procs in std/unicode. For this reason, and for avoiding unnecessary annoyance, Unicody is set up to work co-operatively with std/unicode. You can import both and Rune
is the same everywhere so you can mix and match procs as needed.
Note that you may need to specify which version of some procs you want to call if you import both, eg unicody.validateUtf8
vs just validateUtf8
.
doAssert truncateUtf8("ππππππππππ", maxBytes = 10) == "ππ"
doAssert validateUtf8("abcπdef") == -1 # Matches std/unicode proc signature
let rune = "π".validRuneAt(0) # Returns Option[Rune]
doAssert rune.isSome # A valid rune was found starting at offset 0
nimble test
To prevent Unicody from causing a crash or otherwise misbehaving on bad input data, a fuzzer has been run against it. You can run the fuzzer any time by running nim c -r tests/fuzz.nim
.