Comparison to Timsort and Block sort
dumblob opened this issue · 9 comments
There is TimSort (used by Python) which looks a bit similar in the ideas (based on merge sort), SLOC count, stability and probably performance to quadsort.
Dare to make a comparison? You can take CPython implementation for this (or if you're an adventurous type, implement it based on Julia TimSort as it's well readable unlike other implementations I know of).
Other potential candidate(s) for comparison would be "Block sort" (e.g. https://github.com/BonzaiThePenguin/WikiSort/tree/master or https://github.com/Mrrl/GrailSort which I slightly prefer over WikiSort). A slightly biased (read "not based on measurements") opinion on WikiSort gives the author of Timsort.
qsort() is the function to beat, anyone not up to that challenge is playing for 2nd place, or as of this week, 3rd place.
Well, Timsort (and Block sort kinda as well) is "like" qsort. So, challenge accepted?
What I'm trying to say is that it took me 4 years to write a sorting algorithm that could beat qsort() on equal terms.
It's like I made the highest jump on Earth and people insist it's not valid unless I do it on the Moon as well. I hope you can understand it seems a little surreal to me.
What I'm trying to say is that it took me 4 years to write a sorting algorithm that could beat qsort() on equal terms.
That's fine and it's good to beat qsort()
. On the other hand, qsort()
is well known to have several deficiencies which other sorting algorithms (incl. Timsort) try to solve. In other words, beating qsort()
on equal terms is for almost two decades not any more the goal when designing a sorting algorithm, because there are other and harder challenges.
Judging based on https://www.reddit.com/r/programming/comments/f3d5q0/quadsort_introduction_to_a_new_stable_sorting/fhl8ww9/ (though the "benchmark" overall incl. its tiny magnitude is still way too simplistic to make a better overview), Quadsort is not the fastest one in some cases by a solid margin (but faster by about the same margin in other cases) among single-threaded sorting algorithms. Unfortunately Timsort is not there even though it user very similar ideas to speed up the canonical merge sort.
Btw. as others have (somewhat incorrectly) pointed out, that you should use specialization/templates/similar techniques and thus a different language than C to avoid pointer passing issues (dereference is slow as it basically "quasi-randomly" jumps over memory which is a constant factor, but worse it's highly cache inefficient even for millions of items), I actually think your choice was very good and you should definitely stick with C, but start using its full potential by using C99 restrict
for all pointers (and if not possible, then make it possibly by passing compound data rather than "deconstructed/unwrapped" data/structs) and of course inline
where it makes sense. Without restrict
there is too many unnecessary memory loads and stores.
I am getting a split decision on a macos Catalina, compiled using Apple clang version 11.0.0 (clang-1100.0.33.17)
➜ quadsort git:(master) ./a.out
quadsort: sorted 1000000 elements in 0.155557 seconds. (random order)
qsort: sorted 1000000 elements in 0.144936 seconds. (random order)
quadsort: sorted 1000000 elements in 0.005193 seconds. (forward order)
qsort: sorted 1000000 elements in 0.006011 seconds. (forward order)
quadsort: sorted 1000000 elements in 0.040907 seconds. (reverse order)
qsort: sorted 1000000 elements in 0.030135 seconds. (reverse order)
quadsort: sorted 1000000 elements in 0.045456 seconds. (random tail)
qsort: sorted 1000000 elements in 0.081248 seconds. (random tail)
➜ quadsort git:(master) ./a.out
quadsort: sorted 1000000 elements in 0.155336 seconds. (random order)
qsort: sorted 1000000 elements in 0.145207 seconds. (random order)
quadsort: sorted 1000000 elements in 0.005184 seconds. (forward order)
qsort: sorted 1000000 elements in 0.006011 seconds. (forward order)
quadsort: sorted 1000000 elements in 0.040963 seconds. (reverse order)
qsort: sorted 1000000 elements in 0.030163 seconds. (reverse order)
quadsort: sorted 1000000 elements in 0.045405 seconds. (random tail)
qsort: sorted 1000000 elements in 0.081733 seconds. (random tail)
You'll get somewhat different results for each compiler / system.
Qsort 100000 unknown_32 86 11657430.2 ns/op MO: 148609837
Quadsort 100000 unknown_32 94 10713883.0 ns/op MO: 152641767
Introsort 100000 unknown_32 93 10868172.0 ns/op MO: 176354336
Merge_sort 100000 unknown_32 89 11298089.9 ns/op MO: 147022229
Heap_sort 100000 unknown_32 88 11415965.9 ns/op MO: 283323456
Shell_sort 100000 unknown_32 53 19063754.7 ns/op MO: 135416015
Tim_sort 100000 unknown_32 77 13039363.6 ns/op MO: 134388364
Merge_sort_in_place 100000 unknown_32 81 12414308.6 ns/op MO: 145139842
Grail_sort 100000 unknown_32 68 14830073.5 ns/op MO: 110715760
Sqrt_sort 100000 unknown_32 84 11915619.0 ns/op MO: 143224346
Grail_sort_dyn_buffer 100000 unknown_32 79 12745696.2 ns/op MO: 128632770
Above is a benchmark against a timsort and introsort implementation. I had to make several changes:
The original implementation was in-lining the comparisons, which is one way to beat qsort(), but not a valid one.
The original implementation was using lrand48() which I changed to rand() which should have better entropy.
MO: lists the number of comparisons. This in itself isn't the full equation because the way the algorithm moves data is equally important, but more difficult to measure.
While the Introsort is fast it does have notably more comparisons. So it's faster when inlining and comparing integers, but only for random data, besides that it is not stable and may have DOS vulnerabilities.
Quadsort for some reason is much faster than Introsort on arrays under 1000 elements, I'm still trying to figure out why. Might be cache related.
Grailsort and Timsort look interesting for applications with very heavy comparisons, but I may not be catching every comparison.
qsort() is a moving goal, because it's not required to be quicksort.
- glibc uses mergesort, which makes it stable (unless it runs out of memory, at which point it switch to unstable heapsort) and is now causing trouble for anyone wanting to swap it out. https://lwn.net/Articles/960309/
- musl uses smoothsort (comment section of link above)
- macos libc uses freebsd-derived code in https://github.com/apple-open-source-mirror/Libc/blob/5e566be7a7047360adfb35ffc44c6a019a854bea/stdlib/FreeBSD/qsort.c, ultimately Bentley & McIlroy's "Engineering a Sort Function". This is finally a classic fast-3-way quicksort.
That said, beating glibc's mergesort while being stable is all quad needs to be a reasonable replacement of it. It's even stable without extra heap allocation!
Interesting how future qsort() updates are now pretty much forced to be stable for backward compatibility.
There's nothing that stops glibc from incorporating quadsort or blitsort for 8 byte types, but I haven't bothered to personally invest the time as there's no guarantee it'll pass the bureaucratic hurdles.