/Array-sorting

An array sorting library is presented and a brief explanation of the algorithm how it happened

Primary LanguageC#

Automated array sorting

Algorithms taken from articles on Habré and Wikipedia are presented in the "sorting algorithms" folder.


Realizable algorithms.


Below is a table with the main sorting features.

Name The best time Average Worse Memory
Bubble O(n) O(n^2) O(n^2) O(n)
Shaker O(n) O(n^2) O(n^2) O(n)
Insertion O(n) O(n^2) O(n^2) O(n)
Stooge O(n^(log 3) / log 1.5)) O(n^(log 3) / log 1.5)) O(n^(log 3) / log 1.5)) O(n)
Pancake O(n) O(n^2) O(n^2) O(n)
Shell O(n (log^2 n)) Depends on step selection O(n^2) O(n)
Merge O(n logn) O(n logn) O(n logn) O(n)
Selection O(n^2) O(n^2) O(n^2) O(n)
Quick O(n logn) O(n logn) O(n^2) O(logn)
Gnome O(n) O(n^2) O(n^2) O(n)
Tree O(n) O(n logn) O(n logn) O(n)
Comb O(n) O((n^2) / 2^p) O(n^2) O(n)
BasicCounting O(n) O(n+k) O(n+k) O(n+k)
CombinedBubble O(n) O(n^2) O(n^2) O(n)
Heapify O(n logn) O(n logn) O(n logn) O(n)
Cocktail O(n) O(n^2) O(n^2) O(n)
OddEven O(n) O(n^2) O(n^2) O(n)
Tim O(n) O(n logn) O(n logn) O(n)
Counting O(n+k) O(n+k) O(n) O(n+k)
Radix O(n × k) O(n × k) O(n) O(n × k)
Bucket O(n^2) O(n × k) O(n) O(n × k)
BinaryInsertion O(n) O(n logn) O(n logn) O(n)
Bogo O(n) O(n * n!) O(n * n!) O(n)
Cycle O(-) O(n^2) O(n^2) O(n)
Exchange O(n^2) O(n^2) O(n^2) O(n)
Heap O(n logn) O(n logn) O(n logn) O(n)
MSDRadix O(n * k) O(n * k) O(n logn) O(n)

Brief description of each algorithm taken from different sites.

Bubble

The algorithm consists of repeated passes through the sorted array. At each iteration, neighboring elements are sequentially compared, and if the order in the pair is incorrect, then the elements are swapped. For each pass through the array, at least one element falls into place, so it is necessary to make no more than n−1 passes, where n is the size of the array, to sort the array.

Below is the pseudocode for bubble sort, which takes the array a[0..n−1] as input.

function bubbleSort(a):
  for i = 0 to n - 2
    for j = 0 to n - 2
      if a[j] > a[j + 1]
        swap(a[j], a[j + 1])

10 -> 2000 Synthetic

Bubble

10 -> 2000 Real

Bubble

Shaker

A multiple run through the array is performed, neighboring elements are compared and, if necessary, swapped. When the end of the array is reached, the direction is reversed. Thus, large and small array elements are pushed in turn to the end and the beginning of the structure, respectively. Cocktail sort is also called two-way sorting by simple exchanges. There is a similar modification for selection sorting.

Cocktail sort in Python

def cocktail(data): 
    up = range(len(data) - 1)       
    while True:
        for indices in (up, reversed(up)):
            swapped = False
            for i in indices:
                if data[i] > data[i+1]:  
                    data[i], data[i+1] =  data[i+1], data[i]
                    swapped = True
            if not swapped:
                return data

10 -> 2000 Synthetic

Shaker

10 -> 10000 Real

Shaker

Insertion

The problem is this: there is a part of the array that is already sorted, and you want to insert the remaining elements of the array into the sorted part, while maintaining the order. To do this, at each step of the algorithm, we select one of the input data elements and insert it at the desired position in the already sorted part of the array, until the entire input data set is sorted. The method of selecting the next element from the input array is arbitrary, but usually (and in order to obtain a stable sorting algorithm), elements are inserted in the order of their appearance in the input array.

Since only neighboring elements can change places during the algorithm operation, each exchange reduces the number of inversions by one. Therefore, the number of exchanges is equal to the number of inversions in the original array, regardless of the sorting implementation. The maximum number of inversions is contained in an array whose elements are sorted in non-increasing order. The number of inversions in such an array is n(n−1)2.

The algorithm runs in O(n+k), where k is the number of exchanges of elements of the input array, equal to the number of inversions. On average and in the worst case - for O(n2). The minimum estimates occur in the case of an already ordered initial sequence of elements, the worst - when they are arranged in reverse order.

function insertionSort(a):
  for i = 1 to n - 1
    j = i - 1
    while j ⩾ 0 and a[j] > a[j + 1] 
      swap(a[j], a[j + 1])
      j--

10 -> 2000 Synthetic

Insertion

10 -> 10000 Real

Insertion

Stooge

Stooge sort (Stack sort[1], Wandering sort[2]) is a recursive sorting algorithm with time complexity {\displaystyle O(n^{\log _{1{,}5}{3)))\approx O( n^{2.71})}O(n^{{\log _{{1{,}5}}{3}}})\approx O(n^{{2.71}}). The running time of the algorithm is thus extremely long compared to efficient sorting algorithms such as Merge Sort.

The algorithm for sorting a list of elements is as follows:

If the value of the element at the end of the list is less than the value of the element at the beginning, then swap them. If there are 3 or more elements in the current list subset, then: Recursively call Stooge sort on the first 2/3 of the list Recursively call Stooge sort on the last 2/3 of the list Recursively call Stooge sort on the first 2/3 of the list again Else: return

 algorithm stoogesort(array L, i = 0, j = length(L)-1)
     if L[j] < L[i] then
         L[i] ↔ L[j]
     if j - i > 1 then
         t = (j - i + 1)/3
         stoogesort(L, i  , j-t)
         stoogesort(L, i+t, j  )
         stoogesort(L, i  , j-t)
     return L

10 -> 1000 Synthetic

Stooge

10 -> 1000 Real

Stooge

Pancake

Pancake sorting (from the English pancake sorting) is a sorting algorithm. The only operation allowed in the algorithm is to reverse the elements of the sequence up to some index. Unlike traditional algorithms that minimize the number of comparisons, pancake sort requires as few flips as possible. The process can be visualized as a stack of pancakes, which is shuffled by taking several pancakes from above and turning them over.

10 -> 2000 Synthetic

Pancake

10 -> 2000 Real

Pancake

Shell

Shellsort is a sorting algorithm that is an improved version of insertion sort.

Each pass in the algorithm is characterized by an offset hi, such that elements that are lagging behind each other by hi positions are sorted. Shell suggested using ht=N/2, ht−1=ht/2, … , h0=1. Other offsets are also possible, but h0=1 always.

Start. Step 0. i=t. Step 1. Let's split the array into lists of elements that are separated by hi. Such lists will be hi. Step 2. Sort the elements of each list by insertion sort. Step 3. Combine the lists back into an array. Decrease i. If i is non-negative, go back to step 1 End.

10 -> 10000 Synthetic

Shell

10 -> 10000 Real

Shell

Merge

The algorithm uses the “divide and conquer” principle: the problem is divided into smaller subproblems, which are solved separately, after which their solutions are combined to obtain a solution to the original problem. Specifically, the merge sort procedure can be described as follows:

If the array under consideration has one element, then it is already sorted - the algorithm terminates. Otherwise, the array is split into two parts, which are sorted recursively. After sorting the two parts of the array, the merge procedure is applied to them, which, using the two sorted parts, gets the original sorted array.

function merge(a : int[n]; left, mid, right : int):
    it1 = 0
    it2 = 0
    result : int[right - left]
  
    while left + it1 < mid and mid + it2 < right
        if a[left + it1] < a[mid + it2]
            result[it1 + it2] = a[left + it1]
            it1 += 1
        else
            result[it1 + it2] = a[mid + it2]
            it2 += 1
  
    while left + it1 < mid
        result[it1 + it2] = a[left + it1]
        it1 += 1
  
    while mid + it2 < right
        result[it1 + it2] = a[mid + it2]
        it2 += 1
  
    for i = 0 to it1 + it2
        a[left + i] = result[i]

10 -> 10000 Synthetic

Merge

10 -> 10000 Real

Merge

Selection

At each i-th step of the algorithm, we find the i-th minimum element and swap it with the i-th element in the array. This will result in an array sorted in non-descending order.

 function selectionSort(T[n] a):
   for i = 0 to n - 2
     for j = i + 1 to n - 1
       if a[i] > a[j]
         swap(a[i], a[j])

10 -> 2000 Synthetic

Selection

10 -> 2000 Real

Selection

Quick

Quick sort is one of the most famous and widely used sorting algorithms. The average running time is O(nlogn), which is the asymptotically optimal running time for a comparison-based algorithm. Although the running time of the algorithm for an array of n elements can be Θ(n2) in the worst case, in practice this algorithm is one of the fastest.

  void quicksort(a: T[n], int l, int r)
     if l < r
        int q = partition(a, l, r)
        quicksort(a, l, q)
        quicksort(a, q + 1, r)

10 -> 2000 Synthetic

Quick

10 -> 10000 Real

Quick

Gnome

"Gnome sorting is based on a technique used by a common Dutch garden gnome (Dutch. tuinkabouter). This is the method by which a garden gnome sorts a line of flower pots. It essentially looks at the next and previous garden pots: if they are in the correct order, it steps one pot forward, otherwise it swaps them and steps back one pot. Boundary conditions: if there is no previous pot, it steps forward; if there is no next pot, it is finished."

Dick Grun

Dwarf sorting is an optimized stupid sorting. In stupid sorting, when an unsorted pair of neighbors is found, an exchange occurs and a return to the beginning of the array. Dwarven sorting is just one step back.

Also, the algorithm is interesting in that only one cycle is used, which is very rare for sorting algorithms.

There is an optimized version for gnome sorting.

Cocktail sort in Python

def gnome(data):
    i, size = 1, len(data)
    while i < size:
        if data[i - 1] <= data[i]:
            i += 1
        else:
            data[i - 1], data[i] = data[i], data[i - 1] 
            if i > 1:
                i -= 1
    return data

10 -> 2000 Synthetic

Gnome

10 -> 10000 Real

Gnome

Tree

Binary search tree (BST) is a data structure for working with ordered sets. A binary search tree has the following property: if x is a binary tree node with key k, then all nodes in the left subtree must have keys less than k, and in the right subtree

func insert(x : Node, z : Node):            // x — корень поддерева, z — вставляемый элемент
   while x != null
     if z.key > x.key
        if x.right != null
           x = x.right
        else
           z.parent = x
           x.right = z
           break
     else if z.key < x.key
        if x.left != null
           x = x.left
        else
           z.parent = x
           x.left = z
           break

10 -> 2000 Synthetic

Tree

10 -> 2000 Real

Tree

Comb

Repeated passes are made through the array, in which pairs of elements are compared. If they are not sorted relative to each other, then an exchange is made. As a result, large elements migrate to the end of the array, and small ones migrate to the beginning.

In bubble sort, each time it passes through the array, adjacent elements are compared. Here, elements are compared, between which there is some fixed distance. With each subsequent passage, the distance decreases until it reaches the minimum value.

The decreasing distance between compared elements is calculated using a special value called the reduction factor. The length of the array is divided by this factor, and this is the gap between the indices. After each pass, the distance is divided by the reduction factor and thus a new value is obtained. In the end, it narrows down to the minimum value - one, and the array is simply re-sorted with the usual "bubble".

Cocktail sort in Python

def comb(data):
    gap = len(data)
    swaps = True
    while gap > 1 or swaps:
        gap = max(1, int(gap / 1.25))  # minimum gap is 1
        swaps = False
        for i in range(len(data) - gap):
            j = i + gap
            if data[i] > data[j]:
                data[i], data[j] = data[j], data[i]
                swaps = True
    return data

10 -> 10000 Synthetic

Comb

10 -> 10000 Real

Comb

BasicCounting

Counting sort[2] is a sorting algorithm that uses the range of numbers of the sorted array (list) to count matching elements. The use of counting sort is useful only when the sorted numbers have (or can be mapped to) a range of possible values ​​that is small enough compared to the set to be sorted, for example, a million natural numbers less than 1000.

Suppose the input array consists of {\displaystyle n}n integers ranging from {\displaystyle 0}{\displaystyle 0} to {\displaystyle k-1}k-1, where {\displaystyle k\in \mathbb { N} }k \in \mathbb N. Further, the algorithm will be generalized for an arbitrary integer range. There are several modifications of counting sort, below are three linear and one quadratic, which uses a different approach, but has the same name.

SimpleCountingSort:
    for i = 0 to k - 1
        C[i] = 0;
    for i = 0 to n - 1
        C[A[i]] = C[A[i]] + 1;
    b = 0;
    for j = 0 to k - 1
        for i = 0 to C[j] - 1
            A[b] = j;
            b = b + 1;

10 -> 100000 Synthetic

BasicCounting

10 -> 100000 Real

BasicCounting

CombinedBubble

Sort by simple exchanges, bubble sort is a simple sorting algorithm. To understand and implement this algorithm is the simplest, but it is effective only for small arrays. Complexity of the algorithm: {\displaystyle O}O{\displaystyle (n^{2})}(n^{2}).

The algorithm is considered educational and is practically not used outside the educational literature; instead, more efficient sorting algorithms are used in practice. At the same time, the exchange sort method underlies some of the more advanced algorithms, such as shaker sort, heap sort, and quick sort.

 FOR J=0 TO N-1 STEP 1
   F=0
   MIN=J
   FOR I=J TO N-1-J STEP 1 
     IF Y[I]>Y[I+1] THEN SWAP Y[I],Y[I+1]:F=1
     IF Y[I]<Y[MIN] THEN MIN=I
   NEXT I
   IF F=0 THEN EXIT FOR
   IF MIN<>J THEN SWAP Y[J],Y[MIN]
 NEXT J

10 -> 2000 Synthetic

CombinedBubble

10 -> 2000 Real

CombinedBubble

Heapify

Heap sort, heap sort (eng. Heapsort) is a sorting algorithm that uses a binary heap data structure. This is an unstable sorting algorithm with O(nlogn) running time, where n is the number of elements to sort, and using O(1) additional memory.

 fun heapSort(A : list <T>):
   buildHeap(A)
   heapSize = A.size
   for i = 0 to n - 1
     swap(A[0], A[n - 1 - i])
     heapSize--
     siftDown(A, 0, heapSize)

10 -> 10000 Synthetic

Heapify

10 -> 10000 Real

Heapify

Cocktail

A multiple run through the array is performed, neighboring elements are compared and, if necessary, swapped. When the end of the array is reached, the direction is reversed. Thus, large and small array elements are pushed in turn to the end and the beginning of the structure, respectively. Cocktail sort is also called two-way sorting by simple exchanges. There is a similar modification for selection sorting.

Cocktail sort in Python

def cocktail(data): 
    up = range(len(data) - 1)       
    while True:
        for indices in (up, reversed(up)):
            swapped = False
            for i in indices:
                if data[i] > data[i+1]:  
                    data[i], data[i+1] =  data[i+1], data[i]
                    swapped = True
            if not swapped:
                return data

10 -> 2000 Synthetic

Cocktail

10 -> 10000 Real

Cocktail

OddEven

A multiple run through the array is performed, neighboring elements are compared and, if necessary, swapped. Unlike bubble sort, the array step is two, not one.

First, elements with odd indices are compared/exchanged with elements with even indices (1st with 2nd, 3rd with 4th, 5th with 6th, etc.). Then elements with even indices are compared/exchanged with neighboring elements with odd indices (2nd with 3rd, 4th with 5th, 6th with 7th, etc.). Then again the odd ones are compared with the even ones, then again the even ones with the odd ones, and so on. The process ends if there were no exchanges as a result of two runs, which means the array is ordered.

OddEven sort in Python

def odd_even(data):
    n = len(data)
    isSorted = 0
    while isSorted == 0:
        isSorted = 1
        for i in range(1, n - 1, 2):
            if data[i] > data[i + 1]:
                data[i], data[i + 1] = data[i + 1], data[i]
                isSorted = 0
                  
        for i in range(0, n - 1, 2):
            if data[i] > data[i + 1]:
                data[i], data[i + 1] = data[i + 1], data[i]
                isSorted = 0
    return data

10 -> 2000 Synthetic

OddEven

10 -> 2000 Real

OddEven

Tim

Timsort is a hybrid sorting algorithm that combines insertion sort and merge sort, published in 2002 by Tim Peters. Timsort is currently the standard sorting algorithm in Python, OpenJDK 7, Apple Swift Standard Library 5, and implemented in Android JDK 1.5. The main idea of the algorithm is that in the real world sorted data arrays often contain ordered subarrays. On such data, Timsort is significantly faster than many sorting algorithms.

10 -> 20000 Synthetic

Tim

10 -> 20000 Real

Tim

Counting

Counting sort is a sorting technique based on keys between a specific range. It works by counting the number of objects having distinct key values (kind of hashing). Then do some arithmetic to calculate the position of each object in the output sequence.

Counting sort makes assumptions about the data, for example, it assumes that values are going to be in the range of 0 to 10 or 10 – 99 etc, Some other assumptions counting sort makes are input data will be all real numbers.

Like other algorithms this sorting algorithm is not a comparison-based algorithm, it hashes the value in a temporary count array and uses them for sorting.

Counting sort in Python

def countSort(arr):
    output = [0 for i in range(len(arr))]
    count = [0 for i in range(256)]
    ans = ["" for _ in arr]
    for i in arr:
        count[ord(i)] += 1
    for i in range(256):
        count[i] += count[i-1]
    for i in range(len(arr)):
        output[count[ord(arr[i])]-1] = arr[i]
        count[ord(arr[i])] -= 1
    for i in range(len(arr)):
        ans[i] = output[i]
    return ans
arr = "geeksforgeeks"
ans = countSort(arr)
print("Sorted character array is % s" %("".join(ans)))

10 -> 50000 Synthetic

Counting

10 -> 50000 Real

Counting

Bucket

Bucket sort is mainly useful when input is uniformly distributed over a range. For example, consider the following problem. Sort a large set of floating point numbers which are in range from 0.0 to 1.0 and are uniformly distributed across the range. How do we sort the numbers efficiently? A simple way is to apply a comparison based sorting algorithm. The lower bound for Comparison based sorting algorithm (Merge Sort, Heap Sort, Quick-Sort .. etc) is Ω(n Log n), i.e., they cannot do better than nLogn. Can we sort the array in linear time? Counting sort can not be applied here as we use keys as index in counting sort. Here keys are floating point numbers.
The idea is to use bucket sort. Following is bucket algorithm.

Bucket sort in Python

def insertionSort(b):
    for i in range(1, len(b)):
        up = b[i]
        j = i - 1
        while j >= 0 and b[j] > up:
            b[j + 1] = b[j]
            j -= 1
        b[j + 1] = up    
    return b                
def bucketSort(x):
    arr = []
    slot_num = 10 # 10 means 10 slots, each
                  # slot's size is 0.1
    for i in range(slot_num):
        arr.append([])
    for j in x:
        index_b = int(slot_num * j)
        arr[index_b].append(j)
    for i in range(slot_num):
        arr[i] = insertionSort(arr[i])
    k = 0
    for i in range(slot_num):
        for j in range(len(arr[i])):
            x[k] = arr[i][j]
            k += 1
    return x
x = [0.897, 0.565, 0.656,
     0.1234, 0.665, 0.3434]
print("Sorted Array is")
print(bucketSort(x))

10 -> 10000 Synthetic

Bucket

10 -> 10000 Real

Bucket

BinaryInsertion

We can use binary search to reduce the number of comparisons in normal insertion sort. Binary Insertion Sort find use binary search to find the proper location to insert the selected item at each iteration. In normal insertion, sort it takes O(i) (at ith iteration) in worst case. we can reduce it to O(logi) by using binary search.

BinaryInsertion sort in Python

def binary_search(arr, val, start, end):
    if start == end:
        if arr[start] > val:
            return start
        else:
            return start+1
    if start > end:
        return start
    mid = (start+end)/2
    if arr[mid] < val:
        return binary_search(arr, val, mid+1, end)
    elif arr[mid] > val:
        return binary_search(arr, val, start, mid-1)
    else:
        return mid
def insertion_sort(arr):
    for i in xrange(1, len(arr)):
        val = arr[i]
        j = binary_search(arr, val, 0, i-1)
        arr = arr[:j] + [val] + arr[j:i] + arr[i+1:]
    return arr

10 -> 5000 Synthetic

BinaryInsertion

10 -> 10000 Real

BinaryInsertion

Cycle

Cycle sort is an in-place sorting Algorithm, unstable sorting algorithm, a comparison sort that is theoretically optimal in terms of the total number of writes to the original array.

It is optimal in terms of number of memory writes. It minimizes the number of memory writes to sort (Each value is either written zero times, if it’s already in its correct position, or written one time to its correct position.) It is based on the idea that array to be sorted can be divided into cycles. Cycles can be visualized as a graph. We have n nodes and an edge directed from node i to node j if the element at i-th index must be present at j-th index in the sorted array. Cycle in arr[] = {4, 5, 2, 1, 5} Cycle in arr[] = {4, 3, 2, 1} We one by one consider all cycles. We first consider the cycle that includes first element. We find correct position of first element, place it at its correct position, say j. We consider old value of arr[j] and find its correct position, we keep doing this till all elements of current cycle are placed at correct position, i.e., we don\’t come back to cycle starting point.

Cycle sort in Python

def cycleSort(array):
  writes = 0
  for cycleStart in range(0, len(array) - 1):
    item = array[cycleStart]
    pos = cycleStart
    for i in range(cycleStart + 1, len(array)):
      if array[i] < item:
        pos += 1
    if pos == cycleStart:
      continue
    while item == array[pos]:
      pos += 1
    array[pos], item = item, array[pos]
    writes += 1
    while pos != cycleStart:
      pos = cycleStart
      for i in range(cycleStart + 1, len(array)):
        if array[i] < item:
          pos += 1
      while item == array[pos]:
        pos += 1
      array[pos], item = item, array[pos]
      writes += 1
  return writes

10 -> 3000 Synthetic

Cycle

10 -> 3000 Real

Cycle

Exchange

An exchange sort algorithm is one which compares adjacent elements and moves them to their correct position by swapping them based on a less-than rule. For example, while sorting to ascending order, we might swap if the element on the left is greater than the element on the right. Iteratively following this process gives us a final list where the input elements are sorted in ascending order.

Exchange sort in Python

def bubbleSort(items):
    swapped = True
    pass_count = 1
    while(swapped):
        print('Pass ' + str(pass_count))
        swapped = False
        for i in range(0, len(items) - 1):
            if items[i] > items[i + 1]:
                print('Swapping '
                      + str(items[i])
                      + ' and '
                      + str(items[i+1]))
                items[i], items[i + 1] = items[i+1], items[i]
                swapped = True
        print('After pass '
              + str(pass_count)
              + ', items are: '
              + str(items))
        pass_count += 1
    return items

10 -> 3000 Synthetic

Exchange

10 -> 3000 Real

Exchange

Heap

Heapsort is a comparison based sorting technique based on a Binary Heap data structure. It is similar to selection sort where we first find the maximum element and place the maximum element at the end. We repeat the same process for the remaining element.

Heap sort in Python

def heapify(arr, n, i):
    largest = i  # Initialize largest as root
    l = 2 * i + 1     # left = 2*i + 1
    r = 2 * i + 2     # right = 2*i + 2
    if l < n and arr[i] < arr[l]:
        largest = l
    if r < n and arr[largest] < arr[r]:
        largest = r
    if largest != i:
        arr[i],arr[largest] = arr[largest],arr[i]  # swap
        heapify(arr, n, largest)

10 -> 10000 Synthetic

Heap

10 -> 10000 Real

Heap

MSDRadix

In this article, two types of Radix Sort are discussed:

LSD Radix Sort: It starts sorting from the end of strings (the Least significant digit). MSD Radix Sort: It starts sorting from the beginning of strings (the Most significant digit). In this article, the task is to discuss the MSD Radix Sort and compare it with LSD Radix Sort.

Approach: The idea is to perform the following steps for each digit i where the value of i varies from the most significant digit to the least significant digit:

Store elements in different buckets according to their ith digit. Recursively sort each bucket that has more than one element.

10 -> 200 Synthetic

MSDRadix

10 -> 100 Real

MSDRadix


Why you need to use custom sorting algorithms. Below is a graph of the speed of the "BasicCounting" and "Array.Sort" algorithms for standard c# sorting.

  • Graph in red "Array.Sort".
  • Graph in black "BasicCounting".

From this we can conclude that the custom algorithm is faster than the standard one.

comparison