ashvardanian/StringZilla

Faster sorting algorithm

ashvardanian opened this issue · 0 comments

The original sorting algorithm in StringZilla had two passes - Radix and QuickSort. The first one sorted strings by the first 32 bits, and the second operated on chunks with equivalent prefixes. For QuickSort, we use the qsort function, which has different signatures on different Operating Systems.

#if defined(WIN32) || defined(_WIN32) || defined(__WIN32__) || defined(__NT__)
        qsort_s(sequence->order, split, sizeof(sz_size_t), qsort_comparator, (void *)sequence);
        qsort_s(sequence->order + split,
                sequence->count - split,
                sizeof(sz_size_t),
                qsort_comparator,
                (void *)sequence);
#elif __APPLE__
        qsort_r(sequence->order, split, sizeof(sz_size_t), (void *)sequence, qsort_comparator);
        qsort_r(sequence->order + split,
                sequence->count - split,
                sizeof(sz_size_t),
                (void *)sequence,
                qsort_comparator);
#else
        qsort_r(sequence->order, split, sizeof(sz_size_t), qsort_comparator, (void *)sequence);
        qsort_r(sequence->order + split,
                sequence->count - split,
                sizeof(sz_size_t),
                qsort_comparator,
                (void *)sequence);
#endif

It's ugly and slow, especially compared to the std::sort in the C++ Standard Templates Library. Let's replace that with an excellent specialized algorithm to achieve higher performance! After some initial testing, I know that doubling the speed is possible, but there may be even more room for improvement!