/qsort

Quicksort implemented as a C macro

Primary LanguageC++

qsort.h - Quicksort as a C macro

This is a traditional Quicksort implementation which for the most part follows Robert Sedgewick's 1978 paper. It is implemented as a C macro, which means that comparisons can be inlined. A distinctive feature of this implementation is that it works entirely on array indices, while actual access to the array elements is abstracted out with the less and swap primitives provided by the caller. Here is an example of how to sort an array of integers:

#include "qsort.h"
void isort(int A[], size_t n)
{
    int tmp;
#define LESS(i, j) A[i] < A[j]
#define SWAP(i, j) tmp = A[i], A[i] = A[j], A[j] = tmp
    QSORT(n, LESS, SWAP);
}

Since access to the actual array is so completely abstracted out, the macro can be used to sort a few dependent arrays (which, to the best of my knowledge, no other implementation can do):

#include "qsort.h"
int sortByAge(size_t n, const char *names[], int ages[])
{
    const char *tmpName;
    int tmpAge;
#define LESS(i, j) ages[i] < ages[j]
#define SWAP(i, j) tmpName  = names[i], tmpAge  = ages[i], \
                   names[i] = names[j], ages[i] = ages[j], \
                   names[j] = tmpName,  ages[j] = tmpAge
    QSORT(n, LESS, SWAP);
}

The sort is not stable (this is inherent to most of Quicksort variants). To impose order among the names with the same age, the LESS macro can be enhanced like this:

#define LESS(i, j) ages[i] <  ages[j] || \
                  (ages[i] == ages[j] && strcmp(names[i], names[j]) < 0)

This Quicksort implementation is written by Alexey Tourbin. The source code is provided under the MIT License.

Performance

A benchmark is provided which evaluates the performance of a few implementations: libc's qsort(3), STL's std::sort (denoted resp. stdlib and stl), Michael Tokarev's Inline QSORT() implementation, and this implementation (denoted resp. mjt and svpv). Michael Tokarev's implementation is based on an older glibc's version of Quicksort. Modern glibc versions, including the one used below, use merge sort.

A word of warning: this benchmark does only a tiny bit of averaging. For conclusive evidence, the program needs to be run multiple times.

By default, the bench program sorts 1M random integers.

$ make
g++ -O2 -g -Wall -fwhole-program -o bench bench.cc
./bench
stdlib     402584990    19644762
stl        230878632    25344013
mjt        272302466    24316349
svpv       245908342    23287211

The STL implementation turns out to be the fastest (the first column indicates the number of RDTSC cycles), despite the fact that it performs the largest number of comparisons (the second column).

One reason my implementation comes in second to STL is some if its design limitations. The swap macro issues three moves as a whole, while some parts of the algorithm, notably insertion sort, can benefit from copying items to the right one position rather than doing full exchanges. (It wouldn't be enough to factor swap into save, restore, and copy, though. After an item is saved to the temporary register, it is further required to compare other items to that temporary register, which less can't do.)

Of course, to a considerable degree, performance depends on the compiler being used. I found that my implementation is favoured by Clang, which also disrespects std::sort (using -O3 doesn't help). Sedgewick was right when he said we exposed ourselves to the whims of compilers.

$ rm -f bench && make CXX=clang
clang -O2 -g -Wall -fwhole-program -o bench bench.cc
clang: warning: optimization flag '-fwhole-program' is not supported
./bench
stdlib     414620784    19644762
stl        321896126    25344013
mjt        270644434    24316349
svpv       233669286    23287211

Relevant to performance is another characteristic of my implementation: it does not assume that comparisons are cheap, as with integers, and deliberately tries to reduce the number of comparisons when it is easily possible (specifically, during insertion sort, it does not trade boundary checks for extra comparisons). This pays off when comparisons are expensive, such as when comparing string keys with strcmp(3). In the following example, I use filenames and dependencies from the RPM database as the set of strings to be sorted, shuffling them with shuf(1).

$ rpm -qa --list --requires --provides | shuf >lines
$ wc -l <lines
396681
$ ./bench --strcmp <lines
stdlib     551992820    6883350
stl        607697244    8705273
mjt        572559714    8442262
svpv       535972516    8127244

My implementation comes in first now, and std::sort falls behind by a big margin! Note that qsort(3) makes even fewer comparisons (this is inherent to merge sort as opposed to any Quicksort variant), which makes it come in second. Again, Clang favours my implementation even more.

In the above examples, GCC 6.3.1 and Clang 3.8.0 have been used on a Haswell CPU.