Nekbone
Nekbone solves a standard Poisson equation using a conjugate gradient iteration with a simple or spectral element multigrid preconditioner on a block or linear geometry. It exposes the principal computational kernel to reveal the essential elements of the algorithmic- architectural coupling that is pertinent to Nek5000.
CUDA/OpenACC Branch
NOTE: This project is a fork of the Nekbone to support AMD GPUs when compiling with the Mentor Graphics compiler.
The CUDA/OpenACC branch contains GPU implementations of the conjugate gradient
solver. This includes a pure OpenACC implementation as well as a hybrid
OpenACC/CUDA implementation with a CUDA kernel for matrix-vector multiplication,
based on previous works [1][2]. This implementation can also be compiled with
without OpenACC or CUDA. These implementations were tested with the nek_gpu1
example in tests/nek_gpu1
. They were also tested with the PGI compilers.
Compiling the nek_gpu1 example
The nek_gpu1 example contains three makenek scripts to compile Nekbone with various configurations:
makenek.mpi
: MPI only, without OpenACC or CUDAmakenek.acc
: MPI and OpenACCmakenek.cuda
: MPI and hybrid OpenACC/CUDA
The different build configurations are determined by the presence/absence of the -acc and -Mcuda flags, as demonstrated in the scripts. These scripts will work without out-of-the-box provided that:
mpicc
,mpif77
, andmpif90
set to suitable compilers in your environment (i.e., MPI compilers that use PGI or Cray)- You are compiling for a compute-capability 5.0 or 6.0 GPU
If these defaults are not suitable, you may manually edit the relevant settings
in script. You may also compile without MPI by setting the IFMPI
option in
makenek.
To use the compilation scripts (for example, with OpenACC only):
$ cd test/nek_gpu1
$ ./makenek.acc clean
$ ./makenek.acc
The clean step is necessary if you are switching between MPI, OpenACC, and OpenACC/CUDA builds.
Running nek_gpu1
Successful compilation will produce an executable called nekbone
. You may
run the executable through an MPI process manager, for example:
$ mpiexec -n 1 ./nekbone data
The data
argument specifies that you will be using the runtime settings from
data.rea
. See USERGUIDE.pdf
for a detailed description of the .rea
file
format.
Testing nek_gpu1
The tests/nek_gpu1/scripts/
folder contains simple scripts for comparing
results from the different implementations:
compare_mpi_master.sh
: Runs and compares MPI solution to reference solution from the Nekbone master branchcompare_acc_master.sh
: Runs and compares OpenACC solution to reference solutioncompare_cuda_master.sh
: Runs and compares OpenACC/CUDA solution to reference solutioncompare_mpi_acc.sh
: Runs and compares MPI and OpenACC solutionscompare_mpi_cuda.sh
: Runs and compares MPI and CUDA solutions
To use them:
$ cd test/nek_gpu1
$ scripts/compare_acc_master.sh
Modifying nek_gpu1
The nek_gpu1
problem is setup for 256 elements and a polynomial order of 16.
You may alter this setup in the SIZEand
data.reafiles as described in
USERGUIDE.pdf`.
Compile/Runs on Other Systems
- Compile/Run for MPI only on Titan (16 MPI ranks, 1 node):
$ cd test/nek_gpu1
$ cp ../../bin/makenek.mpi.titan .
$ cp ../../bin/nekpmpi.titan .
$ ./makenek.mpi.titan
$ ./nekpmpi.titan data 16 1
- Compile/Run for MPI and OpenACC on Titan (1 GPU, 1 node):
$ cd test/nek_gpu1
$ cp ../../bin/makenek.acc.titan .
$ cp ../../bin/nekpgpu.titan .
$ ./makenek.acc.titan
$ ./nekpgpu.titan data 1 1
- Compile/Run for MPI and hybrid OpenACC/CUDA on Titan (1 GPU, 1 node):
$ cd test/nek_gpu1
$ cp ../../bin/makenek.cuda.titan .
$ cp ../../bin/nekpgpu.titan .
$ ./makenek.cuda.titan
$ ./nekpgpu.titan data 1 1
References
[1] Jing Gong, Stefano Markidis, Erwin Laure, Matthew Otten, Paul Fischer, Misun Min,
Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations,
The jounral of Supercomputing, Vol. 72, pp. 4160-4180, 2016.
[2] Matthew Otten, Jing Gong, Azamat Mametjanov, Aaron Vose, John Levesque, Paul
Fischer, and Misun Min, An MPI/OpenACC implementation of a high order
electromagnetics solver with GPUDirect communication, The International Journal
of High Performance Computing Application, Vol. 30, No. 3, pp. 320-334, 2016.