Merge Sort algorithm implemented in C available in three versions:
- Sequential
- Parallel with OpenMP
- Parallel with Pthreads (Linux only)
Merge sort is a divide and conquer algorithm that was invented by John von Neumann in 1945.
A detailed description of the algorithm can be found here.
The implemented algorithm are the following:
- Merge Sort
- Parallel Merge Sort
- Parallel Merge Sort using OpenMP
- Parallel Merge Sort using PThread
- C90
- GNU90 (PThread)
- CMake or Make
- C90 compiler (GCC, Clang, MSVC, ...)
- Description
- Dependencies
- Table of Contents
- Quickstart
- Algorithms
- Details on the implementation
- Results
- How to use
- Speed test
- Compilation
- Project Architecture
- GitHub Actions
- Documentations
- Contributors
Depending on you operating system
you will need to install some libs, they are installed differently depending on your
system, please follow the instructions in the Compilation
section.
For an explanation on How to use
go to the according section.
The different algorithms used are described below.
The different algorithms used are described below.
For more detail about them refer to :
Introduction to Algorithms, 3rd Edition, Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
mergeSort(A, n, temp)
if n > 1
mergeSort(A, n/2, temp)
mergeSort(A + n/2, n - n/2, temp)
merge(A, n, temp)
merge(A, n, temp)
i = 0
j = n/2
k = 0
while i < n/2 and j < n
if A[i] < A[j]
temp[k] = A[i]
i++
else
temp[k] = A[j]
j++
k++
while i < n/2
temp[k] = A[i]
i++
k++
while j < n
temp[k] = A[j]
j++
k++
for i = 0 to n
A[i] = temp[i]
mergeSort(A,p,r)
if p < r
q = (p+r)/2
mergeSort(A,p,q)
mergeSort(A,q+1,r)
merge(A,p,q,r)
merge(A,p,q,r)
n1 = q-p+1
n2 = r-q
let L[1..n1+1] and R[1..n2+1] be new arrays
for i = 1 to n1
L[i] = A[p+i-1]
for j = 1 to n2
R[j] = A[q+j]
L[n1+1] = ∞
R[n2+1] = ∞
i = 1
j = 1
for k = p to r
if L[i] <= R[j]
A[k] = L[i]
i = i + 1
else A[k] = R[j]
j = j + 1
The algorithm chosen is the first one, it is the simplest to understand and the most efficient.
The most efficient because it only creates one buffer array named temp
of size n
and not two of size n/2
at each
recursive call
of the function merge
.
By comparing the two algorithms, for any array size the first algorithm was two times faster than the second one.
Note For the parallel algorithms, the first algorithm is used.
The parallel merge sort is a simple implementation of the merge sort algorithm, it uses the fork
system call to create
a new process for each recursive call.
The merge
function is the same as the one used in the sequential merge sort.
The algorithm is written using Cilk
syntax:
mergeSortParallel(A, n, temp)
if n > 1
spawn mergeSortParallel(A, n/2, temp)
mergeSortParallel(A + n/2, n - n/2, temp)
sync
merge(A, n, temp)
merge(A, n, temp)
i = 0
j = n/2
k = 0
while i < n/2 and j < n
if A[i] < A[j]
temp[k] = A[i]
i++
else
temp[k] = A[j]
j++
k++
while i < n/2
temp[k] = A[i]
i++
k++
while j < n
temp[k] = A[j]
j++
k++
for i = 0 to n
A[i] = temp[i]
The parallel algorithm are customized to prevent creation of threads if the array size is too small.
The mergeSort
function is called with a minimum size of MULTITHREAD_THRESHOLD
(defined in the header file).
If the array size is smaller than this threshold, the algorithm is executed sequentially preventing loss of performance
from creating thread for small arrays.
This also apply in the recursive calls of the mergeSort
function. After some division of the recursive call of
the mergeSort
function the array size
is smaller than the threshold, the algorithm is executed sequentially.
The algorithm is the following:
mergeSortParallel(A, n, temp)
if (size < MULTITHREAD_THRESHOLD)
mergeSortSequential(array, size, bufferArray);
else
mergeSortParallel(A, n/2, temp)
mergeSortParallel(A + n/2, n - n/2, temp)
merge(A, n, temp)
Number of int (Array size) | Number of threads | Sequential time (Work) (s) | Parallel OpenMP time (s) | Parallel Pthread time (s) |
---|---|---|---|---|
1000000 | 1 | 0.219678 | 0.21941 | 0.23636 |
2 | 0.140376 | 0.137846 |
Click here to get the details for an array of 10 to 10 000 000 ints.
Number of int (Array size) | Number of threads | Sequential time (Work) (s) | Parallel OpenMP time (s) | Parallel Pthread time (s) |
---|---|---|---|---|
10 | 1 | 0.000023 | 0.000079 | 0.000012 |
2 | 0.000216 | 0.000011 | ||
4 | 0.000439 | 0.000011 | ||
8 | 0.007493 | 0.000011 | ||
16 | 0.000752 | 0.000012 | ||
24 | 0.000861 | 0.000012 | ||
48 | 0.00178 | 0.000012 | ||
100 | 1 | 0.000034 | 0.000039 | 0.000025 |
2 | 0.000228 | 0.000025 | ||
4 | 0.000211 | 0.000032 | ||
8 | 0.005617 | 0.000025 | ||
16 | 0.000866 | 0.000025 | ||
24 | 0.001311 | 0.000025 | ||
48 | 0.002728 | 0.000026 | ||
1000 | 1 | 0.000383 | 0.000387 | 0.000159 |
2 | 0.00411 | 0.001905 | ||
4 | 0.000531 | 0.000493 | ||
8 | 0.007541 | 0.000425 | ||
16 | 0.000808 | 0.000354 | ||
24 | 0.001013 | 0.000369 | ||
48 | 0.002234 | 0.000368 | ||
10000 | 1 | 0.003015 | 0.001617 | 0.00202 |
2 | 0.002415 | 0.001964 | ||
4 | 0.002049 | 0.002111 | ||
8 | 0.003257 | 0.001249 | ||
16 | 0.003079 | 0.003737 | ||
24 | 0.003156 | 0.003423 | ||
48 | 0.00544 | 0.00294 | ||
100000 | 1 | 0.01854 | 0.018547 | 0.199 |
2 | 0.021651 | 0.015189 | ||
4 | 0.01688 | 0.008408 | ||
8 | 0.018882 | 0.01606 | ||
16 | 0.013631 | 0.016771 | ||
24 | 0.016659 | 0.02296 | ||
48 | 0.0227 | 0.035204 | ||
1000000 | 1 | 0.219678 | 0.21941 | 0.23636 |
2 | 0.140376 | 0.137846 | ||
4 | 0.1255587 | 0.103484 | ||
8 | 0.102441 | 0.133948 | ||
16 | 0.082464 | 0.092748 | ||
24 | 0.088869 | 0.294837 | ||
48 | 0.077861 | 0.131344 | ||
10000000 | 1 | 2.568292 | 2.564808 | 2.768002 |
2 | 1.473908 | 1.47016 | ||
4 | 1.177107 | 1.08515 | ||
8 | 0.892717 | 1.117797 | ||
16 | 0.754185 | 0.924634 | ||
24 | 0.738589 | 0.786672 | ||
48 | 0.714466 | 0.709331 |
Note
The results are indicative and may vary depending on the machine.
First you need to build the scripts (check the compilation
section).
Once done you have 2-4 executables :
- fileGenerator
- PThreadVersion (only for Linux)
- OpenMPVersion
- sequentialVersion
The fileGenerator
one is only used to generate the files that will be used by the other 'merge sort' executables.
If you don't want to use it you can use the file named test_file.txt
at the root of the project.
A file is composed of several lines:
- the first one is the number of elements in the array
- the second one is the array
For example :
4
11
2
3
400
All the executables work the same way, use pipes to redirect the input file to the executable.
For example :
./mergeSortSequential < ./test_file
or
cat ./test_file.txt | ./mergeSortSequential
You can then redirect the output to a file using >
or >>
.
Example :
./mergeSortSequential < ./test_file.txt > ./output_file.txt
The output file will now contain the sorted array.
The parallel versions of the merge sort has an additional argument which is the number of threads to use.
For example :
./mergeSortOpenMP < ./test_file > ./output_file.txt 4
The parallel merge sort using OpenMP is started with 4 threads.
The output will explain some details about the execution of the algorithm.
- The number of int in the array
- The number of threads used (only for the parallel versions)
- The CPU Time taken to sort the array
- The Wall Time taken to sort the array
- The sorted array (only if the array is small enough)
The project is set up with some bash
scripts to test the speed of the different algorithms.
You can start the test by running the speedTest.sh
script in the speedTest
folder.
./speedTest.sh <base array size> <multiplier> <iteration number>
Example :
./speedTest.sh 10 10 6
It will generate 6 files with 10, 100, 1000, 10000, 100000, 1000000 elements in the array (they will be located in
the speedTest/speedTestArrays
folder).
It will then run the different algorithms on the different files and output the results in the speedTest/outputs
folder.
Warning You need to build the project before running the script (check the
compilation
section). You also need to be in thespeedTest
folder to run the script correctly.
To compile the app, the first thing you need to do is install a C++ compiler:
- Visual Studio (MSVC)
- Mingw
- GCC
- Cmake
- Make
- ...
First download CMake:
https://cmake.org
Or install it with your package manager under Linux:
sudo apt install cmake
Once your environment is set up, depending on your operating system you'll need to install some libs before compiling
the project. Refer to the section below Windows
or Linux
;
Windows users can directly compile the project by typing the following command at the project root folder:
cmake -B . -DCMAKE_BUILD_TYPE=Release
Note
If you're using Visual Studio, you can install CMake directly from the IDE (Visual Studio Installer). Then you need to open the Project as a CMake Project, not a Visual Studio Project!
Linux's users need to install some libs before compiling the project:
First you need to install gcc
by typing the following command:
sudo apt-get install gcc
Then install CMake, type the following command to install it.
sudo apt-get install cmake
You also need to install the OpenMP lib. Type the following command at the project root.
sudo apt-get install libomp-dev
You are now able to compile the project. Go to the project root and type the following command:
cmake -B . -DCMAKE_BUILD_TYPE=Release
You can also use the Makefile to compile the project. Type the following command at the project root:
First you need to install make
and gcc
for linux users:
For linux:
sudo apt-get install gcc
sudo apt-get install make
You can compile the project for Linux:
make
to compile the project for Linux only.
make LinuxVer
To compile only one of the scripts, type:
make <script_name>
The Linux scripts names are:
fileGen
mergeSortSeq
mergeSortParOpenMp
mergeSortParPthread
The executables will be created in the BuildMakeFile
folder.
ParticleEngine
├── .github
│ ├── workflows
│ │ |── c-cpp.yml
│ │ |── cmake.yml
│ │ |── codeql.yml
│ │ |── cpp-linter.yml
│ │ |── dependency-review.yml
│ │ |── flawfinder.yml
│ │ |── greetings.yml
│ │ |── label.yml
│ │ |── msvc.yml
│ │ |── stale.yml
| ├── labeler.yml
| ├── release.yml
├── buildMakeFile
| ├── readme.txt
| ├── test_file.txt
├── commonFunctions
| ├── commonFunctions.h
| ├── commonFunctions.c
├── fileGenerator
| ├── CMakelists.txt
| ├── main.c
├── OpenMpVersion
| ├── CMakelists.txt
| ├── main.c
| ├── mergeSortParallelOpenMP.c
| ├── mergeSortParallelOpenMP.h
├── PThreadVersion
| ├── CMakelists.txt
| ├── main.c
| ├── mergeSortParallelPThread.c
| ├── mergeSortParallelPThread.h
├── sequentialVersion
| ├── CMakelists.txt
| ├── main.c
| ├── mergeSortSequential.c
| ├── mergeSortSequential.h
├── speedTest
│ ├── outputs
│ │ |── readme.txt
│ ├── speedTestArrays
│ │ |── readme.txt
| ├── fileGenerator.sh
| ├── multithreads_comparison.xlsx
| ├── multithreads_comparison.csv
| ├── speedTest.sh
├── test
| ├── CMakelists.txt
| ├── mergeSortOpenMPTest.c
| ├── mergeSortPThreadTest.c
| ├── mergeSortSequentialTest.c
| ├── sortFunctions.h
├── ParticleEngine
├── .clang-format
├── .clang-tidy
├── .editorconfig
├── .gitattributes
├── .gitignore
├── CMakelists.txt
├── CMakePresets.json
├── CMakeSettings.json
├── Makefile
├── README.md
- CMake: This script is used to build the project.
- MSVC: This script is used to analyze the code with Microsoft C++ Code Analysis.
- CodeQL: This script is used to analyze the code with CodeQL.
- cpp-linter: This script is used to check the C code formatting.
- flawfinder: This script is used to analyze the code with flawfinder.
- c-cpp: This script is used to build the project with C/C++ CI using make.
Introduction to Algorithms, 3rd Edition, Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
Wikipedia:
https://en.wikipedia.org/wiki/Merge_sort
OpenMP:
https://www.openmp.org
OpenMp CMake:
https://cliutils.gitlab.io/modern-cmake/chapters/packages/OpenMP.html
StackOverflow:
https://stackoverflow.com/questions/52767944/merge-sort-with-pthreads-in-c
https://stackoverflow.com/questions/67131148/how-to-do-merge-sort-without-using-additional-arrays-for-splitting-the-initial-a
cliutils:
https://cliutils.gitlab.io/modern-cmake/chapters/packages/OpenMP.html
Quentin MOREL:
- @Im-Rises
- https://github.com/Im-Rises