Implement sorting algorithms
certik opened this issue · 31 comments
At the very least there needs to be some quick algorithm for sorting integers and reals. But I think we can implement several algorithms and the user can choose.
This will also be needed to implement #38.
Is it also an option to link to the algorithms provided by BLAS/LAPACK?
Here are inefficient implementations from fortran-utils: https://github.com/certik/fortran-utils/blob/b43bd24cd421509a5bc6d3b9c3eeae8ce856ed88/src/sorting.f90, the API however might be useful to get inspiration from. Here is more efficient implementation of quicksort: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/sorting.f90#L165. I think there will be much better implementations out there, including in Lapack.
Here is a mutli-threaded one:
https://balsoftware.net/index.php/open-source/multi-thread-sort/
Not sure what is the license.
The following routines implement the version of quicksort outlined in Hanson's and Hopkins book and on the associated web site (http://www.siam.org/books/ot134). The publishers license is very permissive (at least as I read it). This version supports REAL32, REAL64, INT8, INT16, INT32, INT64, Character strings and arrays, and a user defined type. Unfortunately, the file tool for this page does not know anything about .f90 or .F90 file extensions. Here is the code:
Main quicksort module: type specific routines are overloaded with generic qsort interface. Sorting can be in place or return a integer permutation array leaving original array untouched. Sorting can be in either ascending or descending order.
A base user class that can be extended to define user specific classes. Specifies dummy methods for relational operators needed for sorting etc, a print method and an assignment method.
A test program that tests sorting integers, reals, characters, and a user point class that is sorted on distance (Euclidean Norm)
The point class used in test program. It is extended from the base User class
Here are some of mine:
- https://github.com/jacobwilliams/fortran-search-and-sort
- https://github.com/jacobwilliams/stringsort (see also here about natural sorting of strings).
A few more:
Sorting routines in various libraries
- ACM
- Algorithm 402 (version linked here is part of CLAWPACK)
- HSL
- IBM Engineering and Scientific Subroutine Library (ESSL)
- IMSL
- LAPACK
- NAG
- Numerical Recipes
- ORDERPACK
- SLATEC (search for the routines
ipsort,isort,spsort,ssort,dpsort,dsort, andhpsort)
Various user routines
- FlashSort (also see article in Dr.Dobbs or the Wikipedia page)
- Sorting in Fortran by Michael Rutter
- sorting programs in Fortran by Jean-Pierre Moreau
- Sorting by John Mahaffy
- FORTRAN Bubble Sort Speghetti Code by Thomas W Bennet
- MULTI-THREADED SORT by Bal Software
- quicksort by Tim Hoar
- lcpsort - Module for sorting and merging ASCII text strings by N. Shamsundar, University of Houston (a C version by the same author is provided here)
- qsort_inline by Joseph M. Krahn at FortranWiki
- Coding Fortran: a sorting module by Michael Wirth
- Parsing and Sorting with Fortran by Mike Rourke
- quicksort by @t-nissie and a modified version by @1AdAstra1
- Selection sort by C.-K. Shene
- LexicalSort by unknown author from comp.lang.fortran newsgroup
- Ancora sul sort di ‘Compilati vs. Interpretati’ – 2; blog post comparing various Fortran sort implementations (in Italian!)
- M_sort by @urbanjost
- simple_fortran_argsort by @Astrokiwi
- Bubble Sort (WikiBooks)
- Gnome sort (Rosetta Code)
- Insertion sort (Rosetta Code)
Internet forums and discussion boards
- ordering function in Fortran (comp.lang.fortran)
- Reverse quick sort in Fortran (StackExchange)
- Sorting an array from lowest to greatest FORTRAN (StackExchange)
- Fortran Bubble Sort Algorithm (StackExchange)
- Open-source version of Intel Fortran sort routines (StackExchange) provided by Rafik Zurob
Interfaces to C qsort
Published works
- Books
- Algorithms and Data Structures in F and Fortran by Robin Vowels (see chapter 1)
- Introduction to Programming with Fortran by Ian Chivers and Jane Sleightholme (see pp. 703-732)
- Fortran 90 & 95 Array and Pointer Techniques (eBook) by Loren P. Meissner (see chapters 2 and 3)
- Articles
FWIW I needed a sorting procedure to sort the results of of hashing functions to estimate their tendency to produce conflicting hashes. I have implemented four sorting algorithms for sorting rank one arrays of kinds INT8, INT16, INT32, INT64, SP, DP, and QP.
- A hybrid of quicksort using median of three for the pivot and insertion sort for the leaves. There should be no problem with distributing this algorithm with stdlib, but I have my issues with its performance on some arrays
- A version of the introsort of David Musser used as the standard sorting for some C++ standard libraries. This is a hybrid algorithm using insertion sort for the leaves of the sort tree, quicksort by default for the main body, and heapsort if quicksort is not converging quickly. I have permission from David Musser to incorporate my version of introsort into the standard library.
- A algorithm inspired by timsort of Tim Peters used in Python, Java, and other languages. While it claims to be timsort it is not, but has very good performance for a mergesort base sort, particularly on partly sorted data. It's sorting is not in place and stable. It requires temporary space of size N. In particular, it doesn't do the search for runs or the galloping of the original. I call this algorithm simplified timsort. I have asked the author of this algorithm, Aditya Kumar, for permission to use it.
- A version of timsort largely a translation of the original C code to Fortran 2008. This code has at least one error discussed by N. Auger, C. Nicaud, and C. Pivoteau. Merge Strategies: from Merge Sort to TimSort. 2015. hal-01212839v2. that is fixed in my translation. I have written to two old email addresses of Tim Peters asking for permission to use my translation in stdlib, but have not heard back.
I have implemented eight sets of arrays for testing the codes, all of length 2**20.
- Random - a set of integers from 0 to 2**20 - 1 in random order.
- Identical - all integers have the same value of 10.
- Increasing - values increase uniformly from 0 to 2**20-1.
- Decreasing - values decrease uniformly from 2**20-1 to 0.
- Random-sparse - the integers are generated randomly from a set of values from 0 to 2**22-1 so duplicates are sparse.
- Random-dense - the integers are generated randomly from a set of values from 0 to 2**19-1 so duplicates are dense.
- Random-3 - the Increasing array has 3 random exchanges of individual elements.
- Random-10 - the final ten elements of the increasing array are replaced by random values.
The quicksort array sometimes exhibits quadratic behavior on the Identical, Randon-3, and Random-10 arrays so I turned off testing for this sort on these three arrays. The results of the testing are summarized by the following table. The times are averages of eight consecutive runs.
| Type | Elements | Order | Method | Time (s) |
|---|---|---|---|---|
| Integer | 1048576 | Random order | QuickSort | 0.14268 |
| Integer | 1048576 | Increasing | QuickSort | 0.04419 |
| Integer | 1048576 | Decreasing | QuickSort | 0.13219 |
| Integer | 1048576 | Random sparse | QuickSort | 0.14551 |
| Integer | 1048576 | Random dense | QuickSort | 0.14235 |
| Integer | 1048576 | Random order | Intro_Sort | 0.14291 |
| Integer | 1048576 | Identical | Intro_Sort | 0.14412 |
| Integer | 1048576 | Increasing | Intro_Sort | 0.04812 |
| Integer | 1048576 | Decreasing | Intro_Sort | 0.12334 |
| Integer | 1048576 | Random sparse | Intro_Sort | 0.14504 |
| Integer | 1048576 | Random dense | Intro_Sort | 0.14413 |
| Integer | 1048576 | Random 3 | Intro_Sort | 0.10890 |
| Integer | 1048576 | Random 10 | Intro_Sort | 0.34179 |
| Integer | 1048576 | Random order | Sim_Tim_Sort | 0.16737 |
| Integer | 1048576 | Identical | Sim_Tim_Sort | 0.06190 |
| Integer | 1048576 | Increasing | Sim_Tim_Sort | 0.06276 |
| Integer | 1048576 | Decreasing | Sim_Tim_Sort | 0.09906 |
| Integer | 1048576 | Random sparse | Sim_Tim_Sort | 0.16657 |
| Integer | 1048576 | Random dense | Sim_Tim_Sort | 0.16756 |
| Integer | 1048576 | Random 3 | Sim_Tim_Sort | 0.06303 |
| Integer | 1048576 | Random 10 | Sim_Tim_Sort | 0.06194 |
| Integer | 1048576 | Random order | Tim_Sort | 0.22234 |
| Integer | 1048576 | Identical | Tim_Sort | 0.00234 |
| Integer | 1048576 | Increasing | Tim_Sort | 0.00232 |
| Integer | 1048576 | Decreasing | Tim_Sort | 0.00378 |
| Integer | 1048576 | Random sparse | Tim_Sort | 0.20528 |
| Integer | 1048576 | Random dense | Tim_Sort | 0.20005 |
| Integer | 1048576 | Random 3 | Tim_Sort | 0.00353 |
| Integer | 1048576 | Random 10 | Tim_Sort | 0.00507 |
In summary,
- quicksort has good performance where it does not have quadratic behavior, and has particularly good performance on the uniformly increasing array and slightly better performance on the uniformly decreasing array.
- introsort roughly matches quicksort performance, except that it is stable where quicksort has quadratic behavior.
- Simplified timsort has about 20% slower performance on random data, but consistently very good at data with significant stretches of ordered data.
- Timsort has about 80% slower performance than introsort on random data, but extraordinary performance, essentially O(N), on data with significant stretches of ordered data.
FWIW timsort puts a lot of effort into reducing the number of comparisons of elements of the array/list at the expense of some convoluted logic. I suspect this effort is wasted on intrinsic elements and some simplification of the code would improve its performance on random data with minimal impact on its performance on largely sorted data.
I have implemented a fifth sorting routine, a translation of the Rust sorting routine to Fortran 2008. The Rust sort is also inspired by Tim Peters' Timsort algorithm, but drops the "galloping" used to rapidly find the minimum range to be sorted. It is released under the Apache License, Version 2.0 or the MIT license so the translation can be easily incorporated into the standard library. Rust sort is comparable to the simplified Tim Sort on random data, and the true Tim Sort on partially sorted data. I suggest that that a standard library sorting routines include at least two routines: one like Introsort for data that is expected to be random; and one like Rust sort for data that will often have presorted sections.
| Type | Elements | Order | Method | Time (s) |
|---|---|---|---|---|
| Integer | 1048576 | Random order | QuickSort | 0.14655 |
| Integer | 1048576 | Increasing | QuickSort | 0.04457 |
| Integer | 1048576 | Decreasing | QuickSort | 0.12806 |
| Integer | 1048576 | Random sparse | QuickSort | 0.14349 |
| Integer | 1048576 | Random dense | QuickSort | 0.14277 |
| Integer | 1048576 | Random order | Intro_Sort | 0.14568 |
| Integer | 1048576 | Identical | Intro_Sort | 0.14389 |
| Integer | 1048576 | Increasing | Intro_Sort | 0.05068 |
| Integer | 1048576 | Decreasing | Intro_Sort | 0.13336 |
| Integer | 1048576 | Random sparse | Intro_Sort | 0.15501 |
| Integer | 1048576 | Random dense | Intro_Sort | 0.15246 |
| Integer | 1048576 | Random 3 | Intro_Sort | 0.09962 |
| Integer | 1048576 | Random 10 | Intro_Sort | 0.30596 |
| Integer | 1048576 | Random order | Sim_Tim_Sort | 0.16140 |
| Integer | 1048576 | Identical | Sim_Tim_Sort | 0.06535 |
| Integer | 1048576 | Increasing | Sim_Tim_Sort | 0.06486 |
| Integer | 1048576 | Decreasing | Sim_Tim_Sort | 0.10153 |
| Integer | 1048576 | Random sparse | Sim_Tim_Sort | 0.15755 |
| Integer | 1048576 | Random dense | Sim_Tim_Sort | 0.16222 |
| Integer | 1048576 | Random 3 | Sim_Tim_Sort | 0.06694 |
| Integer | 1048576 | Random 10 | Sim_Tim_Sort | 0.06540 |
| Integer | 1048576 | Random order | Tim_Sort | 0.20426 |
| Integer | 1048576 | Identical | Tim_Sort | 0.00261 |
| Integer | 1048576 | Increasing | Tim_Sort | 0.00249 |
| Integer | 1048576 | Decreasing | Tim_Sort | 0.00417 |
| Integer | 1048576 | Random sparse | Tim_Sort | 0.20701 |
| Integer | 1048576 | Random dense | Tim_Sort | 0.20557 |
| Integer | 1048576 | Random 3 | Tim_Sort | 0.00495 |
| Integer | 1048576 | Random 10 | Tim_Sort | 0.00391 |
| Integer | 1048576 | Random order | Rust_Sort | 0.16567 |
| Integer | 1048576 | Identical | Rust_Sort | 0.00204 |
| Integer | 1048576 | Increasing | Rust_Sort | 0.00203 |
| Integer | 1048576 | Decreasing | Rust_Sort | 0.00354 |
| Integer | 1048576 | Random sparse | Rust_Sort | 0.16542 |
| Integer | 1048576 | Random dense | Rust_Sort | 0.15886 |
| Integer | 1048576 | Random 3 | Rust_Sort | 0.00818 |
| Integer | 1048576 | Random 10 | Rust_Sort | 0.00412 |
I think having two or even three of these would be great additions to stdlib. The occurence of pre-sorted sections is typical in indexes of sparse matrices for discretized PDE problems using irregular meshes.
I just uploaded a copy of a hybrid sort (quick sort + insertion sort) by Houlsby & Sloan (both are professors of civil engineering) to a public repository: https://github.com/ivan-pi/hs-sort/. Since I could not figure out the journal licensing conditions, I have sent an email to the original author asking for permission to share the routines under a permissive license. For an array of 1048576 random ordered doubles, the timing is roughly 0.2 seconds, so on the same order as your routines.
hey what is it alll about, i have just started in open source and came across this, could you guide me ahead?
@MUzairS15 This page is an issue in a Git repository devoted to providing a Standard Library for Fortran, similar to standard libraries for C, C++, Java etc. This repository may be of interest to you if you are a Fortran programmer interested extending the language's capabilities. Most of the library code is in Fortran extended by the FYPP preprocessor. Most uses of FYPP in the library are rather simple and easy to learn. The Fortran code can be sophisticated and requires a good knowledge of most of Fortran 2008 other than coarrays. Some limited C++/C processing may also be present using C interoperability.
The issue this page addresses is providing sorting subroutines for Fortran. I have developed routines for sorting arrays of intrinsic type elements, other than character, and am exploring their incorporation into the standard library.
I have tentatively renamed the sorting routines: RUST_SORT => ORD_SORT signifying its better performance on sorting ordered arrays, and INFO_SORT => UNORD_SORT signifying its better performance on randomly ordered arrays. I have decided to extend the capabilities of the sorting routines by adding a new routine to generate an array of sorting indices, so that an array of a derived type can be indirectly sorted based on the sorting of an element of that derived type. As stable sorting may be important for derived types, I have baed this routine on ORD_SORT and have tentatively named this subroutine ORD_SORTING.
Testing of ORD_SORTING immediately ran into a problem in that it generated a segmentation fault. I now believe that this fault was due to the small default stack size on Mac OS X. ORD_SORTING i believe uses over three times the memory of ORD_SORT, which in turn usually uses more memory than UNORD_SORTING. Raising the stack size to over 65 Mbytes appears to have fixed the problem in sorting arrays of 2**20 elements. I have not tested it on larger arrays. In attempting to debug the segmentation fault, I have changed my compiler from gfortran 10.2 to ifort 18.03. I was pleased to find that performance with ifort as the compiler was about twice as fast as with gfortran. The ifort runtimes are
| Type | Elements | Order | Method | Time (s) |
|---|---|---|---|---|
| Integer | 1048576 | Random order | Ord_Sort | 0.09484 |
| Integer | 1048576 | Identical | Ord_Sort | 0.00066 |
| Integer | 1048576 | Increasing | Ord_Sort | 0.00064 |
| Integer | 1048576 | Decreasing | Ord_Sort | 0.00112 |
| Integer | 1048576 | Random sparse | Ord_Sort | 0.09372 |
| Integer | 1048576 | Random dense | Ord_Sort | 0.09464 |
| Integer | 1048576 | Random 3 | Ord_Sort | 0.00317 |
| Integer | 1048576 | Random 10 | Ord_Sort | 0.00141 |
| Integer | 1048576 | Random order | Unord_Sort | 0.06738 |
| Integer | 1048576 | Identical | Unord_Sort | 0.03666 |
| Integer | 1048576 | Increasing | Unord_Sort | 0.01244 |
| Integer | 1048576 | Decreasing | Unord_Sort | 0.03999 |
| Integer | 1048576 | Random sparse | Unord_Sort | 0.06779 |
| Integer | 1048576 | Random dense | Unord_Sort | 0.06999 |
| Integer | 1048576 | Random 3 | Unord_Sort | 0.07607 |
| Integer | 1048576 | Random 10 | Unord_Sort | 0.13818 |
| Integer | 1048576 | Random order | Ord_Sorting | 0.11640 |
| Integer | 1048576 | Identical | Ord_Sorting | 0.00322 |
| Integer | 1048576 | Increasing | Ord_Sorting | 0.00309 |
| Integer | 1048576 | Decreasing | Ord_Sorting | 0.00437 |
| Integer | 1048576 | Random sparse | Ord_Sorting | 0.11340 |
| Integer | 1048576 | Random dense | Ord_Sorting | 0.11387 |
| Integer | 1048576 | Random 3 | Ord_Sorting | 0.00784 |
| Integer | 1048576 | Random 10 | Ord_Sorting | 0.00415 |
A few more user wishes concerning sorting are in j3-fortran/fortran_proposals#101.
@certik @ivan-pi @rweed @jacobwilliams @jvdp1
You have all expressed interest in having the standard library include a module for sorting data. I now have a reasonable draft of such a module and a testing program, but before it is pulled into the standard library the following issues should be addressed:
- Is
stdlib_sortingthe ideal name for the module; - Should the integer type used for indexing be
INT32orINT64; - Should the integer parameter used for indexing be named
INT_SIZEor something else; - What is the best name for the following generic subroutines identified by their tentative names:
A.UNORD_SORT, a subroutine optimized for sorting near random data,
B.ORD_SORT, a subroutine optimized for sorting data that has been partially sorted, and
C.ORD_SORTING, a subroutine designed to generate an array of indices that would sort the input array (this is intended to be used to sort an array of a derived type based on the values of a component of that type); - The current testing program exceeds the default Mac OS X stack limit of 8192 kbytes when running
ORD_SORTINGon at least one test case with an input array of2**20elements, but has no problems with the maximum stack limit of 65532 kbytes. This will cause problems in testing on other computers. Partial fixes for this problem include:
A. Setting the stack size larger than the default in running the tests,
B. Test using a test case of significantly fewer than2**20elements,
C. Setting the integer type used for indexing toINT32fromINT64reducing memory usage inORD_SORTINGby about one third,
D. Set the outputINDEXarray forORD_SORTINGtoINTENT(INOUT), rather than an allocatableINTENT(OUT)so that theINDEXarray can be allocated outside of the routine in static memory rather than on the stack, reducing memory on the stack by up to a factor of two, or
E. Set the inputARRAYforORD_SORTINGtoINTENT(INOUT)rather than the currentINTENT(IN)which necessitates allocating a temporary array to hold the values ofARRAYduring sorting. ORD_SORTandORD_SORTINGallocate memory and callERROR STOPif allocation fails instead of returning an error flag.
Several things of minor note:
- The codes are about twice as fast when compiled with ifort 18.0.3 than when compiled with gfortran 10.2.0 with the largest relative differences for
UNORD_SORT. - When compiled with ifort
UNORD_SORTis about 40% faster thanORD_SORTon near random data. ORD_SORTis several orders of magnitude faster thanUNORD_SORTon data with large sections of pre-sorted data.
The current draft specification document follows:
title: Sorting Procedures
The stdlib_sorting module
(TOC)
Overview of sorting
The sorting of collections of data is useful in the analysis of the
collections.
With its absence of generics and limited polymorphism, it is
impractical in current Fortran to provide sorting routines for
arbitrary collections of arbitrary types of data.
However Fortran's arrays are by far its most widely used collection,
and arrays of arbitrary types of data can often be sorted in terms of
a single component of intrinsic type.
The Fortran Standard Library therefore provides a module,
stdlib_sorting with procedures to sort arrays of single intrinsic
numeric types, i.e. the different kinds of integers and reals.
Overview of the module
The module stdlib_sorting defines several public entities one
default integer parameter, int_size, and three overloaded
subroutines: ORD_SORT, UNORD_SORT, and ORD_SORTING. The
overloaded subroutines also each have seven specific names for
versions with differend types of array. arguments.
The int_size parameter
The int_size parameter is used to specify the kind of integer used
in indexing the various arrays. Currently the module sets int_size
to the value of int64 from the intrinsic ISO_FORTRAN_ENV module.
For many applications a value of INT32 would be sufficient for
addressing and would save some stack space for the subroutines,
The module subroutines
The stdlib_sorting module provides three different overloaded
subroutines intended to sort three different kinds of arrays of
data:
UNORD_SORTis intended to sort simple arrays of intrinsic data
that is effectively unordered before the sort;ORD_SORTis intended to sort simple arrays of intrinsic data
that has significant sections that were partially ordered before the
sort; andORD_SORTINGis intended to provide indices for sorting arrays of
derived type data, based on the ordering of an intrinsic component
of the derived type.
The UNORD_SORT subroutines
UNORD_SORT uses the [introsort sorting algorithm of David Musser]
(http://www.cs.rpi.edu/~musser/gp/introsort.ps). introsort is a hybrid
unstable comparison algorithm combining quicksort, insertionsort, and
heapsort. While this algorithm is always O(N Ln(N)) it is relatively
fast on randomly ordered data, but inconsistent in performance on partly
sorted data. David Musser has given permission to include a variant of
introsort in the Fortran Standard Library under the MIT license provided
we cite:
Musser, D.R., “Introspective Sorting and Selection Algorithms,”
Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).
as the official source of the algorithm.
As with introsort, UNORD_SORT is an unstable hybrid
algorithm using insertion sort for the outer branches of the sort tree,
by default using quicksort for the main body of the sort tree, but if
quicksort is converging too slowly the algorithm resorts
to heap sort. The algorithm is of order O(N Ln(N)) for all inputs.
Because it relies on quicksort, the coefficient of the O(N Ln(N))
behavior is typically small compared to other sorting algorithms.
The algorithm shows relatively little performance difference (factors
of two to four) between arrays of "random" data and arrays with
partially sorted data.
The ORD_SORT subroutine
ORD_SORT is a translation of the [rust sort sorting algorithm]
(https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
which in turn is inspired by the [timsort algorithm of Tim Peters]
(http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
ORD_SORT is a hybrid stable comparison algorithm combining mergesort,
and insertion sort. It is always at worst O(N Ln(N)) in sorting random
data, having a performance about 15-40% slower than UNORD_SORT on random
data, but has much better performance than UNORD_SORT on partially
sorted data, having O(N) performance on uniformly increasing or
decreasing data. The
rust sort implementation
is distributed with the header:
Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
file at the top-level directory of this distribution and at
http://rust-lang.org/COPYRIGHT.
Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
<LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
option. This file may not be copied, modified, or distributed
except according to those terms.
so the license for the original code is compatible with the use of
modified versions of the code in the Fortran Standard Library under the
MIT license.
As with timsort, ORD_SORT is a stable hybrid algorithm.
It begins by traversing the array starting in its tail attempting to
identify runs in the array, where a run is either a uniformly
decreasing sequence, ARRAY(i-1) > ARRAY(i), or non-decreasing,
ARRAy(i-1) <= ARRAY(i), sequence. Decreasing sequences are reversed
then, if the sequence has less than MIN_RUN elements, previous
elements in the array are added to the run using insertion sort
until the run contains MIN_RUN elements or the array is completely
processed. As each run is identified start and length of the run is
then pushed onto a stack and the stack is then processed using merge
until it obeys the stack invariants:
- len(i-2) > len(i-1) + len(i)
- len(i-1) > len(i)
ensuring that processing the stack is, at worst, of order O(N Ln(N)). However, because of the identification of decreasing and
non-decreasing runs, processing of structured data can be much faster,
with processing of uniformly decreasing or non-decreasing arrays being
of order O(N). The result in our tests is that ORD_SORT is about
15-40% slower than UNORD_SORT on purely random data, depending on
the compiler, but can be more than an order of magnitude faster than
UNORD_SORT on highly structured data. ORD_SORT allocates a
temporary array with half the elements of the input array, that can
cause problems with stack limits for large input arrays.
The ORD_SORTING subroutine
The UNORD_SORT and ORD_SORT subroutines can sort isolated arrays
of intrinsic types, but do nothing for the sorting of arrays of
derived types. For arrays of derived types what is useful is an array
of indices that maps the original array to an array sorted based on the
value of a component of the derived type. For such a sort a stable
sort is useful, therefore the module provides a subroutine,
ORD_SORTING, that generates such an array of such indices based on
the ORD_SORT algorithm. As ORD_SORTING is also based on the
rust sort algorithm the rust sort license must be acknowledged:
Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
file at the top-level directory of this distribution and at
http://rust-lang.org/COPYRIGHT.
Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
<LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
option. This file may not be copied, modified, or distributed
except according to those terms.
noting that the Fortran Standard Library is released under the MIT
license so that incorporation of the rust sort algorithm is
compatible with its license.
The logic of ORD_SORTING parallels that of ORD_SORT, with
additional housekeeping to keep the array of indices consistent with
the sorted positions of the input array. Because of this additional
housekeeping it has slower performance than the other two
routines. Because it uses several temporary arrays, it is also more
sensitive to stack limits than the other two routines.
Tentative specifications of the stdlib_sorting procedures
unord_sort - sorts an input array
Status
Experimental
Description
Returns an input array with the elements sorted in order of increasing
value.
Syntax
call unord_sort ( array )
Class
Pure generic subroutine.
Arguments
array : shall be a rank one array of any of the types:
integer(int8), integer(int16), integer(int32), integer(int64),
real(real32), real(real64), or real(real128). It
is an intent(inout) argument. On return its input elements will be
sorted in order of non-decreasing value.
Notes
UNORD_SORT implements a hybrid sorting algorithm combining
quicksort, merge sort, and insertion sort. For most purposes it
behaves like a quicksort with a median of three partition, providing
good O(N Ln(N)) run time performance for most random arrays, but
defaulting to merge sort if the structure of the array results in a
prediction of poor runtime performance.
ord_sort - sorts an input array
Status
Experimental
Description
Returns an input array with the elements sorted in order of increasing
value.
Syntax
call ord_sort ( array )
Class
Generic subroutine.
Arguments
array : shall be a rank one array of any of the types:
integer(int8), integer(int16), integer(int32), integer(int64),
real(real32), real(real64), or real(real128). It
is an intent(inout) argument. On return its input elements will be
sorted in order of non-decreasing value.
Notes
ORD_SORT implements a hybrid sorting algorithm combining
heapsort, and insertion sort. For most purposes it behaves like a
merge sort, providing worst case O(N Ln(N)) run time performance
for most random arrays, that is typically slower than UNORD_SORT.
However, if the array has significant runs of decreasing or
non-decreasing values performance can be much better than
UNORD_SORT, with O(N) behavior on uniformly decreasing, or
non-decreasing arrays.
ord_sorting - creates an arry of sorting indices for an input array.
Status
Experimental
Description
Returns an integer array whose elements would sort the input array in
the specified direction retaining order stability.
Syntax
call unord_sort ( array, index[, reverse, zero_based )
Class
Subroutine.
Arguments
array: shall be a rank one array of any of the types:
integer(int8), integer(int16), integer(int32), integer(int64),
real(real32), real(real64), or real(real128). It is an
intent(in) argument. It will be an array whose sorting
indices are to be determined.
index: shall be a rank one allocatable integer array of kind
int_size. It is an intent(out) argument. On return it shall be
of size(array) with values that shall be the indices needed to sort
array in the desired direction.
reverse (optional): shall be a scalar of type default logical. It
is an intent(in) argument. If present with a value of .true. then
the indices in index will sort array in order of non-increasing
values in stable order. Otherwise the indices will sort array in
order of non-decreasing values in stable order.
zero_based (optional): shall be a scalar of type default logical.
It is an intent(in) argument. If present with the value .true.
the values of index will be for zero based array indexing,
otherwise the values of index will be for one's based array indexing.
Notes
ORD_SORTING implements the hybrid sorting algorithm of ORD_SORT.
As a merge sort based algorithm it is a stable sorting comparison
algorithm. It uses several temporary arrays that can cause problems
with stack limits.
I was wondering if a work array can be implemented for this (work array will decrease the number of temporary arrays): https://en.wikipedia.org/wiki/Merge_sort#Top-down_implementation
Work arrays would certainly help stack memory usage of ORD_SORT and ORD_SORTING (which requires more than one work array). If I resort to minimizing memory usage then their APIs would be:
ord_sort - sorts an input array
Status
Experimental
Description
Returns an input array with the elements sorted in order of increasing
value.
Syntax
call ord_sort ( array, work )
Class
Generic subroutine.
Arguments
array : shall be a rank one array of any of the types:
integer(int8), integer(int16), integer(int32), integer(int64),
real(real32), real(real64), or real(real128). It
is an intent(inout) argument. On input it is the array to be
sorted. On return its elements will be
sorted in order of non-decreasing value.
work (optional): shall be a rank one array of any of the same type as array,
and shall have at least size(array)/2 elements. It
is an intent(out) argument. Its contents on return are undefined.
Notes
ORD_SORT implements a hybrid sorting algorithm combining
heapsort, and insertion sort. For most purposes it behaves like a
merge sort, providing worst case O(N Ln(N)) run time performance
for most random arrays, that is typically slower than UNORD_SORT.
However, if the array has significant runs of decreasing or
non-decreasing values performance can be much better than
UNORD_SORT, with O(N) behavior on uniformly decreasing, or
non-decreasing arrays. If array is of any type REAL the behavior of
the sorting is undefined if any element of array is a NaN. The optional
work array replaces "scratch" memory that would otherwise be
allocated on the stack.
ord_sorting - creates an arry of sorting indices for an input array.
Status
Experimental
Description
Returns an integer array whose elements would sort the input array in
the specified direction retaining order stability.
Syntax
call ord_sorting ( array, index[, work, iwork, reverse, zero_based )
Class
Subroutine.
Arguments
array: shall be a rank one array of any of the types:
integer(int8), integer(int16), integer(int32), integer(int64),
real(real32), real(real64), or real(real128). It is an
intent(inout) argument. On input it will be an array whose sorting
indices are to be determined. On output it will be the sorted
array.
index: shall be a rank one allocatable integer array of kind
int_size of the same size as array. It is an intent(inout) argument.
On return it shall have values that are the indices needed to sort
the original array in the desired direction.
work (optional): shall be a rank one array of the same type as array,
and shall have at least size(array)/2 elements. It
is an intent(out) argument. Its contents on return are undefined.
iwork (optional): shall be a rank one integer array of kind int_size,
and shall have at least size(array)/2 elements. It
is an intent(out) argument. Its contents on return are undefined.
reverse (optional): shall be a scalar of type default logical. It
is an intent(in) argument. If present with a value of .true. then
the indices in index will sort array in order of non-increasing
values in stable order. Otherwise the indices will sort array in
order of non-decreasing values in stable order.
zero_based (optional): shall be a scalar of type default logical.
It is an intent(in) argument. If present with the value .true.
the values of index will be for zero based array indexing,
otherwise the values of index will be for one's based array indexing.
Notes
ORD_SORTING implements the hybrid sorting algorithm of ORD_SORT.
As a merge sort based algorithm it is a stable sorting comparison
algorithm. If array is of any type REAL the behavior of
the sorting is undefined if any element of array is a NaN. The optional
work and iwork arrays replace "scratch" memory that would otherwise be
allocated on the stack.
I have implemented optional work arrays and the current testing code now does not run out of stack space when processing 2**20 element arrays with ORD_SORTING with the default Mac OS X maximum stack size of 8192 kbytes. However I discovered that my testing code was allocating space on the stack for expressions such as ARRAY(INDEX(:)), and those expressions had to be assigned to static storage before the stack problems were fully fixed.
@wclodius2 Thank you for this impressive work. Sorting algorithms are highly needed for multiple procedures in stdlib (e.g., #337, sorting arrays of sparse matrices, ...). Therefore I believe it would be a really nice addition.
To answer to (some of) your questions in a previous comment:
Is stdlib_sorting the ideal name for the module;
OK for me
Should the integer type used for indexing be INT32 or INT64;
If there is no size limit on the size of the array to be sorted, then I am in favor of int64 (I got too often seg faults with int32 in such algorithms).
What is the best name for the following generic subroutines identified by their tentative names:
A. UNORD_SORT, a subroutine optimized for sorting near random data,
B. ORD_SORT, a subroutine optimized for sorting data that has been partially sorted, and
C. ORD_SORTING, a subroutine designed to generate an array of indices that would sort the input array (this is intended to be used to sort an array of a derived type based on the values of a component of that type);
These names are fine for me, and are helpful for users who don't care about the sorting algorithm but want the best algorithm for sorting their array.
However, for other users, would it be also possible to provide the procedrues with names that refer to their algorithm (e.g., merge_sort, insertion_sort)?
@wclodius2 Thank you for this impressive work. Sorting algorithms are highly needed for multiple procedures in
stdlib(e.g., #337, sorting arrays of sparse matrices, ...). Therefore I believe it would be a really nice addition.
snip
Should the integer type used for indexing be INT32 or INT64;
If there is no size limit on the size of the array to be sorted, then I am in favor of
int64(I got too often seg faults withint32in such algorithms).
There are no obvious limits on the size of the arrays that can be processed. Maybe more thorough testing would find some. For example, UNORD_SORT with its recursion would exceed maximum stack space in processing a sufficiently large array. There might be limits on the static storage required to allow ORD_SORT and ORD_SORTING to process large arrays.
What is the best name for the following generic subroutines identified by their tentative names:
A. UNORD_SORT, a subroutine optimized for sorting near random data,
B. ORD_SORT, a subroutine optimized for sorting data that has been partially sorted, and
C. ORD_SORTING, a subroutine designed to generate an array of indices that would sort the input array (this is intended to be used to sort an array of a derived type based on the values of a component of that type);These names are fine for me, and are helpful for users who don't care about the sorting algorithm but want the best algorithm for sorting their array.
However, for other users, would it be also possible to provide the procedrues with names that refer to their algorithm (e.g.,merge_sort,insertion_sort)?
The current procedures are hybrid sorts, and most clearly identified by names that would be meaningless to most users.
ORD_SORT and ORD_SORTING are based on the rust sort of the Rust Language which is a simplified version of the timsort of Tim Peters. The core of the algorithm is an iterative (not recursive) merge sort. The algorithm examines the next "few" elements of the array for a run of uniformly decreasing values, or a run of uniformly non-decreasing values. If the run is uniformly decreasing then the order of the elements are reversed. If the resulting run is too short, it is padded to the desired minimum size and an insertion sort is performed on the run. Then the run is compared with previously identified runs to see if any recent runs should be merged, with an algorithm that ensures at worst O(N Ln(N)) runtime performance.
UNORD_SORT is based on the introsort (short for introspective sort) of David Musser. It combines three sorting algorithms: quicksort, heap sort, and insertion sort. First it examines the array and estimates the depth of recursion a quick sort would require for ideal (random) data, D = Ln(N)/Ln(2). It then defines a limit to the number of quicksort recursions to be allowed in processing, D_limit = factor * D, where factor is typically taken to be 2, and calls introsort proper. 'introsort` proper then:
- Examines the number of elements remaining to be sorted, and, if they are less than 16, sorts them using
insertion sortand returns; - If they are not less than 16, checks whether the current depth of recursion exceeds
D_limitand, if it does, processes the remaining elements withheap sortand returns; - If the current depth of recursion does not exceed
D_limit, then in effect do aquicksortstep:- Partition the remaining array using a median of three,
- Call
introsortproper on the leftmost partition, - Call
introsortproper on the rightmost partition, and then returns.
thanks for the detailed explanations. Based on these, the proposed names make sense to me.
I have started a pull request on the stdlib_sorting module so far focussed on the markdown document, stdlib_sorting.md. I would like some feedback on the overall API, before including the module proper and the testing code for approval.
Should stdlib also provide an interface to C's qsort function?
The motivation is flexibility -- it allows sorting arbitrary derived types with arbitrary user-defined comparison functions. The downside is the user needs to provide a comparison function, and the call requires using some iso_c_binding features. To me this makes it somewhat "non-fortranic", and I prefer to use alternatives such as those suggested above, if possible. But for "very general" cases, I don't know a better technique than using C's qsort.
If we were to do this, the sorting module would need to have an interface to C's qsort:
interface
subroutine qsort_C(array, elem_count, elem_size, compare) bind(C,name="qsort")
use iso_c_binding, only: c_ptr, c_size_t, c_funptr
implicit none
type(c_ptr), value :: array ! C-pointer to the first entry of the array
integer(c_size_t), value :: elem_count ! Number of elements in the array
integer(c_size_t), value :: elem_size ! Size of each element, according to c_sizeof()
type(c_funptr), value :: compare ! c_funptr to the user-provided comparison function
end subroutine qsort_C
end interfaceTo use it, the user would need to provide a function compare that takes 2 entries of array and returns -1, 0, or 1 depending on their order. For instance if we are sorting integers (edited on 18th April to make arguments of type c_ptr):
function test_integer_compare(i1ptr, i2ptr) result(sgn) bind(C)
type(c_ptr), value, intent(in) :: i1ptr, i2ptr
integer, pointer :: i1, i2 ! In more general cases these would be of type(my_derived_type)
integer(c_int) :: sgn
call c_f_pointer(i1ptr, i1)
call c_f_pointer(i2ptr, i2)
! The user defines what 'less than', 'equal', 'greater than' means by setting 'sgn'
if(i1 < i2) sgn = -1_c_int
if(i1 == i2) sgn = 0_c_int
if(i1 > i2) sgn = 1_c_int
end functionThe call to qsort_C is a bit ugly because one needs to make use of c_loc and c_funloc and c_sizeof.
subroutine test_qsort_C
integer, target :: array(5) ! In more general cases this could be of type(my_derived_type)
! To be sorted
array = [4, 3, 5, 1, 2]
! In-place sort of the array
call qsort_C( &
c_loc(array(1)), &
size(array, kind=c_size_t), &
c_sizeof(array(1)), &
c_funloc(test_integer_compare) )
if(all(array == [1, 2, 3, 4, 5])) then
print*, 'PASS'
else
print*, 'FAIL'
end if
end subroutineMy sense is that stdlib should provide something with this level of generality. Is there any better approach?
I now have three requested changes in the API. Let me know if you have strong preferences.
- Change the name of
ORD_SORTING. Possibilities include:ORD_SORT_INDEXORD_ARGSORTORD_INDEXSORT_INDEX- Maybe some other name?
- Add a
REVERSEsubroutine - Add a wrapper for the C/C++ library
qsort. This would allow direct sorting of arrays of a derived type.
Upon looking at the fortran standard in more detail, I've become uncertain about the correct use of C's qsort. I've raised a question here to clarify.
Based on the fortran discourse discussion I have corrected the qsort_C example above. The key point is that the comparison function should take input arguments of type(c_ptr), value, intent(in), and then do type casting. This will work for an arbitrary user-specified type. It will also work for intrinsic types. Interestingly it won't work for sorting character(len=*) -- for characters we need to specify len.
@gareth-nx , @jvdp1 and I are having a discussion of my results and its possible implications for the naming conventions of the stdlib_sorting module's API on #386 .
@ivan-pi on #386 has expressed concerns that the implemented sortings for character(*) arrays and string_type arrays (using the comparison operators and not the lexical comparison functions) may not match users needs and might be best removed from the API and my code.
Also note that the results of the testings may be of interest to @certik for his Fortran benchmarks project.
I have committed and pushed my source code to #386 . The testing harness found a couple of errors that didn't show up with my compilers on my computer. I have committed and pushed the fixed codes. The testing harness for the fixed codes does not report any further errors.
@gareth-nx Do you mind emailing me at milancurcic@hey.com? I can't find your contact info. I apologize for the off-topic message in this PR, but I can't find a better way on GitHub. :)