XFLUIDS: A SYCL-based unified cross-architecture heterogeneous simulation solver for compressible reacting flows

XFLUIDS is a parallelized SYCL C++ solver for large-scale high-resolution simulations of compressible multi-component reacting flows. It is developed by [Prof. Shucheng Pan's] (https://teacher.nwpu.edu.cn/span.html) group at the School of Aeronautics, Northwestern Polytechincal University.

main developers:

Jinlong Li (ljl66623@mail.nwpu.edu.cn)
Shucheng Pan (shucheng.pan@nwpu.edu.cn)

other contributors:

Yixuan Lian, Renfei Zhang

References

If you use XFLUIDS for academic aplications, please cite our paper:

Jinlong Li, Shucheng Pan (2024). XFLUIDS: A SYCL-based unified cross-architecture heterogeneous simulation solver for compressible reacting flows. arXiv:2403.05910. (https://arxiv.org/abs/2403.05910)

Features

Support CPU, GPU (intergal & descrte), and FPGA without porting code
General for multi-vendor devices (Intel/NVIDIA/AMD/Hygon ... )
High portability, productivity, and performace
GPU-aware MPI
Highly optimized kernels & device functions for multicomponent flows and chemical reaction

1. Dependencies before cmake

1.1. IF USE AdaptiveCpp(known as OpenSYCL/hipSYCL, recommended)

1.1.1. install boost-version-1.83(needed by AdaptiveCpp)
1.1.2. install AdaptiveCpp, for how to install AdaptiveCpp: different backends need different dependencies
1.1.3. add bin, libs and includes of AdaptiveCpp and dependencies to ENV PATHs
1.1.4. XFLUIDS use find_package(AdaptiveCpp) targetting AdaptiveCpp compile system, set cmake option AdaptiveCpp_DIR
```
cmake -DAdaptiveCpp_DIR=/path/to/AdaptiveCpp/lib/cmake/AdaptiveCpp ..
```

Device discovery: exec "acpp-info" in cmd for device counting

$ acpp-info
=================Backend information===================
Loaded backend 0: OpenMP
  Found device: hipSYCL OpenMP host device
Loaded backend 1: CUDA
  Found device: NVIDIA GeForce RTX 3070
=================Device information===================
***************** Devices for backend OpenMP *****************
Device 0:
General device information:
  Name: hipSYCL OpenMP host device
  Backend: OpenMP
  Vendor: the hipSYCL project
  Arch: <native-cpu>
  Driver version: 1.2
  Is CPU: 1
  Is GPU: 0
***************** Devices for backend CUDA *****************
Device 0:
General device information:
  Name: NVIDIA GeForce RTX 3070
  Backend: CUDA
  Vendor: NVIDIA
  Arch: sm_86
  Driver version: 12000
  Is CPU: 0
  Is GPU: 1

1.2. IF USE Intel oneAPI(only recommended on Intel GPU platform)

1.1.1.intel oneapi version >= 2023.0.0 as compiler
1.1.2.codeplay Solutions for NVIDIA and AMD backends if GPU targets are needed
1.1.3.1.activate environment for oneAPI appended codeplay sultion libs
```
source /opt/intel/oneapi/setvars.sh  --force --include-intel-llvm
```
1.2.3.2.or you can use the script files(only basic environments are included)
```
source ./scripts/oneAPI/oneapi_base.sh
```
1.1.4.libboost_filesystem as external lib while gcc internal filesystem is missing

Device discovery: exec "sycl-ls" in cmd for device counting

$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2
[opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 7 5800X 8-Core Processor 3.0
[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA T600 0.0 [CUDA 11.5]

2. Select target device in SYCL project

set integer platform_id and device_id for targetting different backends("DeviceSelect" in json file or option: -dev)
```
auto device = sycl::platform::get_platforms()[platform_id].get_devices()[device_id];
sycl::queue q(device);
```

3. Compile and usage of this project

3.1. Read root <XFLUIDS/CMakeLists.txt>

CMAKE_BUILD_TYPE is set to "Release" by default, SYCL code would target to host while ${CMAKE_BUILD_TYPE}==Debug
set INIT_SAMPLE as the problem being tested, path to "species_list.dat" and "reaction_list.dat" should be given to MIXTURE_MODEL
MPI and AWARE-MPI support added in project, AWARE_MPI need specific GPU-ENABLED mpi version, details referenced in [4-mpi-libs]("4. MPI libs")
VENDOR_SUBMIT allows throwing some parallism tuning cuda/hip model to their GPU, only supportted by AdaptiveCpp compile environment

3.2. BUILD and RUN

3.2.1.Build with cmake

build with cmake

cd ./XFLUIDS
mkdir build && cd ./build && cmake .. && make -j

3.2.2.Local machine running
XFLUIDS automatically read <XFLUIDS/settings/*.json> file depending on INIT_SAMPLE setting
```
./XFLUIDS
```
Append options to XFLUIDS in cmd for another settings, all options are optional, all options are listed in [6. executable file options]("6. Executable file options")
```
./XFLUIDS -dev=1,1,0
mpirun -n mx*my*mz ./XFLUIDS -mpi=mx,my,mz -dev=1,0,0
```

3.2.2.Slurm sbatch running on Hygon(KunShan) supercompute center

cd ./XFLUIDS/scripts/KS-DCU
sbatch ./1node.slurm
sbatch ./2node.slurm

4. MPI libs

4.1. Set MPI_PATH browsed by cmake before build

cmake system of this project browse libmpi.so automatically in path of ${MPI_PATH}/lib, please export MPI_PATH to the mpi you want:
```
export MPI_PATH=/home/ompi
```

4.2. The value of MPI_HOME, MPI_INC, path of MPI_CXX(libmpi.so/linmpicxx.so) output on screen while it is found

  -- MPI settings:
  --   MPI_HOME:/home/ompi
  --   MPI_INC: /home/ompi/include added
  --   MPI_CXX lib located: /home/ompi/lib/libmpi.so found

5. .json configure file arguments

reading commits in src file: <${workspaceFolder}/src/read_ini/settings/read_json.cpp>

6. Executable file options

name of options	function	type
-domain	domain size : length, width, height	float
-run	domain resolution and running steps: X_inner,Y_inner,Z_inner,nStepmax(if given)	int
-blk	initial local work-group size, dim_blk_x, dim_blk_y, dim_blk_z,DtBlockSize(if given)	int
-dev	device counting and selecting: device munber,platform,device	int
-mpi	mpi cartesian size: mx,my,mz	int
-mpi-s	"weak" or "strong"	std::string
-mpidbg	append the option with or without value to open mpi multi-rank debug	just append

7. Output data format

Set "OutDAT", "OutVTI" as 1 in .json file

7.1. Tecplot file

import .dat files of all ranks of one Step for visualization, points overlapped between boundarys of ranks(3D parallel tecplot format file visualization is not supportted, using tecplot for 1D visualization is recommended)

7.2. VTK file

use paraview to open *.pvti files for MPI visualization(1D visualization is not allowed, using paraview for 2/3D visualization is recommended);

Cite XFLUIDS

@misc{li2024xfluids,
      title={XFLUIDS: A SYCL-based unified cross-architecture heterogeneous simulation solver for compressible reacting flows}, 
      author={Jinlong Li and Shucheng Pan},
      year={2024},
      eprint={2403.05910},
      archivePrefix={arXiv}
}

Acknowledgments

XFLUIDS has received financial support from the following fundings:

The Guanghe foundation (Grant No. ghfund202302016412)
The National Natural Science Foundation of China (Grant No. 11902271)

TvTBarry/XFluids

XFLUIDS: A SYCL-based unified cross-architecture heterogeneous simulation solver for compressible reacting flows

References

Features

1. Dependencies before cmake

1.1. IF USE AdaptiveCpp(known as OpenSYCL/hipSYCL, recommended)

1.1.1. install boost-version-1.83(needed by AdaptiveCpp)

1.1.2. install AdaptiveCpp, for how to install AdaptiveCpp: different backends need different dependencies

1.1.3. add bin, libs and includes of AdaptiveCpp and dependencies to ENV PATHs

1.1.4. XFLUIDS use find_package(AdaptiveCpp) targetting AdaptiveCpp compile system, set cmake option AdaptiveCpp_DIR

Device discovery: exec "acpp-info" in cmd for device counting

1.2. IF USE Intel oneAPI(only recommended on Intel GPU platform)

1.1.1.intel oneapi version >= 2023.0.0 as compiler

1.1.2.codeplay Solutions for NVIDIA and AMD backends if GPU targets are needed

1.1.3.1.activate environment for oneAPI appended codeplay sultion libs

1.2.3.2.or you can use the script files(only basic environments are included)

1.1.4.libboost_filesystem as external lib while gcc internal filesystem is missing