Add robust unicode support (probably via ICU bindings)
steveklabnik opened this issue · 3 comments
Issue by brson
Wednesday Jun 04, 2014 at 22:09 GMT
For earlier discussion, see rust-lang/rust#14656
This issue was labelled with: A-libs, I-enhancement in the Rust repository
There's been lots of talk about unicode over the years. We have little support in the core libs, but need to provide something better for serious use. Best idea now is to wrestle libicu into a Rust crate. Start out-of-tree.
I honestly think this shouldn't be in a languages core. A good external library sounds better.
The issue is manifold: First of all, unicode isn't the solution to all problems. For example, Big5 is still a de-facto standard in some parts of Asia. I wouldn't default on unicode too much.
Another thing is that full unicode support comes with a lot of baggage, e.g. it is locale dependent, comes with a lot of specialized lingo, memory and disk size for tables, etc. Even systems specialised on text management (e.g. Elasticsearch) ship things like ICU as an additional plugin just for the pure (disk and memory) weight. Fixing bugs only in sync with the main language will be a problem.
A good unicode text library out of tree will probably be the best solution.
Hello folks! I hope you don't mind a bit of code advertising, especially since this particular issue seems very topical.
For some time now, in the context of Fuchsia OS, we've been working on rust bindings for the ICU library. This was originally started as an in-tree contribution to Fuchsia, but somewhere in January 2020 we moved it to a stand alone repository, which you're cordially invited to check out.
Now, anyone can make a one-off binding library and it can be a fun exercise in learning the ropes of a language and how it interfaces with other code via FFI. And people have done so before, a casual search will turn up a few such projects.
But, to make it a robustly and continuously tested library is somewhat tedious, and I am not aware that anyone else has done it. We invested some time to fix the developer toil for those who may care, so that you don't need to spend days figuring out how to install exactly the correct ICU version in exactly the right way so you can begin to work on the project.
For example, if you have Docker installed, you can clone the project and type make docker-test
and you're off to the races.
Some features of interest:
rust_icu
can be built hermetically,- It is continuously tested for ICU versions starting with 63, ending with 66, with support for 67 (latest at the time of this writing) coming in a matter of days,
- Has with feature coverage tracker, though a bit simplistic.
This means it should be fairly easy for anyone to dive in and use or contribute. It's available on crates.io, and docs can be seen on docs.rs, as you'd expect.
As for the downside, the feature coverage isn't that impressive. What's there is what we needed for Fuchsia. In general, the extent of our contribution is motivated by Fuchsia's needs. That said, I think we've lowered the barrier to entry enough that it becomes practical for someone motivated to contribute the missing functionality.
And if you think you can contribute a feature you needed but didn't know where to place, that's possible too. More details about the project are available in README.md.
Of course, this doesn't diminish the significance of other projects that deal with Unicode support in rust; but it may fill a niche need of providing ICU functionality until a fully rustful implementation is ready for use.
Best regards,
F