/mtxreusedist

Program for computing reuse distance and similar metrics for sparse matrix-vector multiplication

Primary LanguageCGNU General Public License v3.0GPL-3.0

This is the README file for mtxreusedist, a program for computing
reuse distance and other metrics for data locality in connection with
sparse matrix-vector multiplication kernels.

  Copyright (C) 2023 James D. Trotter

  Copying and distribution of this file, with or without modification,
  are permitted in any medium without royalty provided the copyright
  notice and this notice are preserved.

Building
--------

The mtxreusedist program can be built with `make'. Compilation and
linking may be configured through the environment variable `CC', which
is used to choose a compiler, and `CFLAGS' and `LDFLAGS', which are
used to set compiler flags and linker flags, respectively. Here is an
example:

     make CC=gcc CFLAGS="-O3 -march=native"

If support for loading zlib-compressed Matrix Market files is needed,
then compile with -DHAVE_LIBZ and link to zlib:

     make CC=gcc CFLAGS="-O3 -march=native -DHAVE_LIBZ" LDFLAGS="-lz"


Usage
-----

mtxreusedist is used to load a matrix from a file in Matrix Market
format (see https://math.nist.gov/MatrixMarket/formats.html), convert
the sparse matrix to compressed sparse row (CSR) format, and then
compute reuse distance or similar metrics for a sparse matrix-vector
multiplication (SpMV) kernel using the given matrix.

Some parts of these computations can use OpenMP for shared-memory
parallelism. In this case, the environment variable `OMP_NUM_THREADS'
can be set to control the number of threads that are used. In
addition, `OMP_PROC_BIND' can be set to bind threads to particular
cores.

If the option `--verbose' is supplied, then some more information is
printed, such as how long time is spent loading the matrix and
computing the reference distance.

Here is an example showing how to compute a histogram of the reference
distance for the HB/1138_bus (https://sparse.tamu.edu/HB/1138_bus)
from the SuiteSparse Matrix Collection. First, download the matrix in
Matrix Market file format, and then use the following command:

    $ ./mtxreusedist --histogram --verbose 1138_bus.mtx >1138_bus.txt
    mtxfile_read: 0.003338 seconds (12.7 MB/s)
    csr_from_coo: 0.000119 seconds, 1,138 rows, 1,138 columns, 4,054 nonzeros, 2 to 18 nonzeros per row
    computing reuse/reference distance:
    done computing reuse/reference distance in 0.000131 seconds
    writing reuse/reference distance:
    done writing reuse/reference distance in 0.001446 seconds

The standard output is redirected to the file 1138_bus.txt, which now
contains a histogram of reference distances corresponding to accesses
to the source/input vector for the SpMV kernel.

The file consists of two entries for every line, which specify the
distance in the number of accesses (or in cache lines if the
`--line-size' option was used) and the number of accesses for that
particular distance. Here is an example:

    $ head 1138_bus.txt
    distance count
    0 2188
    1 455
    2 275
    3 180
    4 106
    5 88
    6 56
    7 49
    8 36
    ...

On the the last line, the distance is '∞', which that the
corresponding accesses to the source vector never experience reuse.

Note that distances are given in terms of accesses to the source
vector only. Recall that for rowwise SpMV kernels, such as those based
on matrices in CSR format, the only irregular memory accesses are to
the source/input vector. Accesses to column offsets and matrix
nonzeros are regular and they are never reused (at least within a
single SpMV iteration). Thus we are only concerned with the reuse of
the source/input vector.

The gnuplot script 1138_bus.gnu gives an example of how to use gnuplot
to visualise the reference distance histogram.


Copying
-------
mtxreusedist is free software. See the file COPYING for copying
conditions.