/grid2grid

A library transforming between two arbitrary grid-like matrix data layouts over MPI ranks.

Primary LanguageC++BSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

grid2grid

This is a library that transforms a matrix between two arbitrary grid-like data layouts. By layout, we mean the way in which a matrix is distributed over MPI ranks.

For example, imagine a matrix split into four blocks, such that each block resides on a different rank as shown below:

+---------------------+
|       |             |
|       |             |
|   0   |      1      |
|       |             |
|       |             |
+---------------------+
|       |             |
|   2   |      3      |
|       |             |
+---------------------+

Assume that we want to redistribute the matrix, to achieve the following data layout:

+---------------------+
|   0   |    2    | 1 |
|       |         |   |
+---------------------+
|   1   |    0    | 1 |
|       |         |   |
+---------------------+
|       |         |   |
|   2   |    2    | 0 |
|       |         |   |
+---------------------+

This is exaclty what this library is doing!

Features:

  • initial and final layouts do not have to be block-cyclic, but any grid-like data layout.
  • blocks do not have to be all the same dimensions, but can have different dimensions.
  • a rank might own arbitrary number of blocks
  • the type of entries in a matrix can be arbitrary
  • blocks that belong to the same rank do not have to be consecutive in memory
  • each block can have an arbitrary stride
  • OpenMP support

Performance:

When tested against ScaLAPACK (as provided by Intel MKL), this library outperforms the MKL's pdgemr2d routine that transforms between two block-cyclic data layouts by up to 50%, as shown below:

Algorithm

The pipeline of the algorithm is roughly the following:

  • Find the intersections of the initial and final grids (called grid cover) in one pass, which decomposes initial blocks into smaller blocks.
  • Sort decomposed initial blocks based on the rank to which they should be sent. Thus, if some ranks should exchange more than one block, they will do it within a single message.
  • Copy decomposed intial blocks from a local storage (with arbitrary local data layout) to a temporary MPI buffer.
  • Perform MPI_Alltoallv.
  • Use the computed grid cover to decompose blocks from the final grid into smaller blocks.
  • Sort decomposed final blocks based on the receiver rank.
  • All smaller blocks that are received are copied from a temporary MPI buffer to the local storage (with arbitrary local data layout).

Building and Installing

Assuming that you want to use the gcc 8 compiler and OpenMP, you can build the project as follows:

# clone the repo
git clone --recursive https://github.com/kabicm/grid2grid
cd grid2grid
mkdir build
cd build

# build
CC=gcc-8 CXX=g++-8 cmake -DCMAKE_BUILD_TYPE=Release -DWITH_OPENMP=TRUE ..

# compile
make -j 4

Example

To start with, there is a small example that transforms the matrix between two block-cyclic layouts. This example can be run from build directory as follows:

mpirun --oversubscribe -np 4 ./examples/scalapack2scalapack -m 10 -n 10 -ibm 2 -ibn 3 -fbm 3 -fbn 5 -pm 2 -pn 2

Where flags have the following meaning:

  • (m, n): Dimensions of matrix that we want to change the layout for.
  • (ibm, ibn): Dimensions of initial blocks, determining the initial block-cyclic distribution.
  • (fbm, fbn): Dimensions of final blocks, determining the final block-cyclic distribution that we want to reach.
  • (pm, pn): Processor grid, determining the processor decomposition for the block-cyclic distribution.

Arbitrary Grid-Like Data Layouts

To transform between two arbitrary grid-like data layouts, we need to construct two grid_layout objects, one describing the initial layout and one describing the final layout. After that, we just invoke:

grid2grid::grid_layout initial_layout(...);
grid2grid::grid_layout final_layout(...);
grid2grid::transform(initial_layout, final_layout, MPI_COMM_WORLD);

In order to create a grid_layout object, we need to provide the following information:

  • grid2D: small struct describing the grid, basically 2 vectors, one describing where rows are split and one describing where columns are split in a grid.
  • owners: a matrix that specifies the owner (i.e. the rank) of each block in grid2D.
  • local_blocks: a vector of blocks that current rank owns. Each block is defined with a pointer to the beginning of that block in the local memory of current rank, a stride (by default equal to the number of rows of a block) and dimensions.

This is done for ScaLAPACK block-cyclic data layout, which can serve as an example of how grid_layout object can be created.

Author

Marko Kabic (marko.kabic@cscs.ch)