Parallel using MPI and OpenMP
cc -Wall rbbreakup.c -o rbb -O
./rbb -ichunk 2 -jchunk 2 -in input.txt
This will generate four files, input.txt.0, input.txt.1, input.txt.2 and input.txt.3, that will be processed by four MPI ranks.
Here, -ichunk
specifies in how many "columns" we split the grid, -jchunk
– in how many "rows". When ichunk = 1
and jchunk = 2
, the grid stays the same.
When ichunk = 1
and jchunk = 2
, the division will look like this:
input.txt.0 |
---|
input.txt.1 |
When ichunk = 2
and jchunk = 2
, the division will look like this:
input.txt.0 | input.txt.1 |
---|---|
input.txt.2 | input.txt.3 |
mpicc -fopenmp -o s stencil.c
mpirun --bind-to none -n 4 ./s -inp input.txt -res output.txt -threads 4 -ichunk 2 -jchunk 2
Here, -n
specifies the number of cores used by MPI, which should be equal to the number of subgrids (= files), and ichunk * jchunk = n
. -threads
is the number of threads used by OMP. ichunk
and jchunk
is the number of blocks. Best performance is achieved when threads == n
.