jens-maus/libcodesets

Add sorting functionality

afalkenhahn opened this issue · 1 comments

Codesets should really have sorting capabilities because AFAICS there's currently no Amiga library which offers full UTF-8 collation.

I think such a feature would be way beyond the scope of codesets.library. The main purpose is to convert arbitrary text between different codesets, not more, not less.
Sorting is a completely different matter, especially when it comes to UTF8 support.

When you start a simple search for "compare utf8 strings c++" you will get results like these:
https://stackoverflow.com/questions/7146405/why-the-comparision-of-two-strings-in-utf8-is-not-correct
https://stackoverflow.com/questions/7141417/how-can-i-compare-utf8-string-such-as-persian-words-in-c

To make it short: comparing UTF8 is far more than a simple string comparison. And it is something completely different than codeset conversion.

If you know a UTF8 text can more or less be converted to plain ASCII then a comparison can become simple. Just convert both strings to ASCII or any other suitable codeset and compare these strings. Of course this will never be 100% correct for arbitrary UTF8 string, but at least for >98% of the strings in a typical Amiga environment.