LU decomposition (without partial pivoting) for solving linear systems using OpenMP, CUDA and SYCL.
Since this project uses CUDA, which is only available for nvidia gpus, you are required to have a compatible gpu, at least to run the CUDA and SYCL targets.
We also assume that you have the corresponding drivers installed.
In Linux, you can use nvidia-smi
to test if your drivers are installed correctly.
- GNU Make, as the build system.
- C++ compiler compatible with C++20 and OpenMP 4.5 or later.
- We only tested for
g++
but other compilers should work.
- We only tested for
nvcc
, the nvidia CUDA compiler.icpx
, the Intel's Data Parallel C++ compiler wich implements the SYCL specification.- Follow this guide on how to install the CUDA backend extension.
- Our implementation uses the SYCL 2020 specification.
The installation of these dependencies can vary from system to system. It is up to you to figure out how to install and execute them in your system.
First, if you wish to build the lusycl
target,
you must set up the environment.
. /opt/intel/oneapi/setvars.sh --include-intel-llvm
After that the following commands can be used to build each of the different targets.
make [all | lu | lublk | luomp | lucuda | lusycl] # Builds a target
./bin/lu.out # Serial LU decomposition
./bin/lublk.out # Block-based serial LU decomposition
./bin/luomp.out # Block-based parallel LU decomposition using OpenMP
./bin/lucuda.out # Block-based parallel LU decomposition using CUDA
./bin/lusycl.out # Block-based parallel LU decomposition using SYCL
make clean # Removes all the produced executables