ddemidov/amgcl

Best matrix adapter for BCSR matrix data?

GonzaloSaez opened this issue · 8 comments

Hi all,

I'd like to test this library to solve a block linear system. Currently, my block linear system is stored in block CSR format as in AmgX. In this format, there only exist one row_ptr and col_idx entry per block and the matrix values are input using row-major ordering per block, i.e. the data of block b with sub-indices (i,j) is stored in data[b * BlockSize * BlockSize + i * BlockSize + j]. Which is the most convenient way to initialize such matrices in AmgCL (provided that I have the matrix data already in memory)? Most examples show how to read matrices from a mtx files but not how directly set block matrix data. Also, it seems that some adapters such as block_matrix_adapter require one row_ptr and col_idx entry per block index. At last, does AmgCL require columns to be ordered within a given row?

Thank you for your help.

It looks like your matrix layout is equivalent to amgcl::csr<amgcl::static_matrix<double, BlockSize, BlockSize>>. You should be able to use the amgcl::adapter::zero_copy adapter after you reinterpret_cast your val array to static_matrix<>*.

Most examples show how to read matrices from a mtx files but not how directly set block matrix data.

Here is such an example assembling the system matrix for a Poisson problem (the system is scalar in this case):

// Generates matrix for poisson problem in a unit cube.
template <typename ValueType, typename ColType, typename PtrType, typename RhsType>
int sample_problem(
ptrdiff_t n,
std::vector<ValueType> &val,
std::vector<ColType> &col,
std::vector<PtrType> &ptr,
std::vector<RhsType> &rhs,
double anisotropy = 1.0
)

it seems that some adapters such as block_matrix_adapter require one row_ptr and col_idx entry per block index

That is because the block matrix adapter converts the scalar matrix representation to a block one.

At last, does AmgCL require columns to be ordered within a given row?

AMGCL solvers copy and order the input matrix (unless you use the zero_copy adapter, in which case you need to make sure the columns are ordered within a row). Some other cases require a sorted matrix (like block matrix adapter) as well.

Hi @ddemidov ,

Thank you for the detailed and quick answer. I followed your suggestions and it worked perfectly. Thanks again. If you don't mind, I have some extra questions about this library

  1. To define distributed matrices, we just need to provide the number of local rows and also the col vector must refer to global column indices, correct?
  2. Is it possible to output an equation-based error norm instead of the block error norm?
  3. Which would be the recommended backend for the BCSR format + GPUs which is still compatible with most algorithms (i.e. supports row_iteration)? VexCL or CUDA?
  4. For the schur_pressure_correction type preconditioners, it is possible to reuse the BCSR format that I described in the first post to define the pressure and velocity entries (i.e. not having a separate matrix for the velocity and pressure)?

Thanks!

To define distributed matrices, we just need to provide the number of local rows and also the col vector must refer to global column indices, correct?

Correct, see more here: https://amgcl.readthedocs.io/en/latest/tutorial/poisson3DbMPI.html

Is it possible to output an equation-based error norm instead of the block error norm?

No, it is not possible. Each solver defines the error norm as sqrt(math::norm(inner_product(e,e))), and math::norm for amgcl::static_matrix is defined as frobenius norm. The norm should return a scalar for the generic solver code to work.

Which would be the recommended backend for the BCSR format + GPUs which is still compatible with most algorithms (i.e. supports row_iteration)? VexCL or CUDA?

VexCL is the only GPGPU backend in amgcl that currently supports block value format. Neither supports row_iteration, so e.g. gauss_seidel relaxation is not supported. But an ILU is possible with triangular solves approximated with a couple of jacobi iterations.

For the schur_pressure_correction type preconditioners, it is possible to reuse the BCSR format that I described in the first post to define the pressure and velocity entries (i.e. not having a separate matrix for the velocity and pressure)?

Yes, you need to define separate backends for the pressure and velocity solvers, where one of the backends uses block values. Here is an example: https://github.com/ddemidov/cppstokes_benchmarks/blob/master/amgcl_spc_pre.cpp (see https://arxiv.org/abs/2006.06052 for more details).

Thanks a lot @ddemidov I got it to work with the VexCL. Closing the issue for now.

Hi Denis sorry to bump this post,

I'm having some troubles with static_matrix value backends + MPI + aggregation using the configuration you showed me in this issue. I'm using the following backends and MPI solver/matrix (sorry for the lack of formatting):

using MatType = amgcl::static_matrix<float, 4, 4>;
using VecType = amgcl::static_matrix<float, 4, 1>;
using SBackend = amgcl::backend::builtin<MatType>;
using Solver = amgcl::mpi::make_solver<amgcl::runtime::mpi::preconditioner<SBackend>,
                              amgcl::runtime::mpi::solver::wrapper<SBackend>>;
auto val_cast = reinterpret_cast<MatType*>(vals());
auto mat_cast = amgcl::adapter::zero_copy(..., ..., ..., val_cast);
mat_ = std::make_shared<amgcl::mpi::distributed_matrix<SBackend>>(world_, *mat_cast);

Since I'm using the zero_copy adapter, the column indices of the matrix are sorted accordingly to their global id (and not their local id).

To avoid linear and non-linear solver divergence with AMG I had to add
precond.coarsening.aggr.block_size=4
to the aggregation preconditioner params (I'm solving the incompressible NS equations with block size = 4). This improves a lot non-linear convergence and the solver results seem on par with that obtained using other linear algebra libraries. However, when using the latter option with >1 MPI rank, AMGCL outputs the following error

Matrix size should be divisible by block_size

Is this expected? Could this be a bug related to global column indexing on my end? The error disappears if I use
precond.coarsening.aggr.block_size=1
in the params.

Also, should the aggregation method figure out the block size from the static_matrix parameters? Or is the block_size param referring to the aggregation size in terms of block units?

Thank you for your help.

the column indices of the matrix are sorted accordingly to their global id

That is the way it should be

Matrix size should be divisible by block_size

You should either use block-valued backend, or set block_size>1, but not both. With block-valued backend, block_size should be equal to 1 (I am not even sure if block_size>1 is suported for MPI version of the preconditioner). I don't know why it improves the convergence, but you need to use some other tuning parameters.

should the aggregation method figure out the block size from the static_matrix parameters? Or is the block_size param referring to the aggregation size in terms of block units?

All algorithms in amgcl are expressed in terms of the value_type of the backend, including the coarsening, so the block_size is expressed in terms of blocks.

Thanks @ddemidov I got it more or less to work for MPI.

I suppose that you'd recommend CPR to cope with the classical incompressible Navier-Stokes coupled linear system right? I've found that the eps_strong value tends to make an important influence in the solution. Which value of this parameter would you recommend?

Also, I've found some things that you may find interesting:

  1. In MPI, rebuilding the preconditioner using the syntax solver_ = std::make_shared<Solver>(world_, mat_, prm); tends to give SIGSEGV related to the destructor of world_, although the UB may come from elsewhere. Have you ever noticed that?
  2. It seems that the MPI CPR implementation is not consistent with the serial in some parts. The MPI CPR implementation does it's specific inverse computation by hand instead of calling the backend inverse method, and also the static_assert regarding backend types fails with the MPI CPR while it works with serial CPR. Is the MPI CPR version compatible with static_matrix value types? I suppose no but just wanted to make sure.

Thanks again for your help!

I suppose that you'd recommend CPR to cope with the classical incompressible Navier-Stokes coupled linear system right?

May be Schur complement pressure correction would work better here (see https://arxiv.org/abs/2006.06052), I think CPR was specifically developed for black-oil models.

In MPI, rebuilding the preconditioner using the syntax solver_ = std::make_shared(world_, mat_, prm); tends to give SIGSEGV related to the destructor of world_, although the UB may come from elsewhere. Have you ever noticed that?

I haven't really tested rebuilding on MPI (sorry), can you provide a minimal code reproducing this?

It seems that the MPI CPR implementation is not consistent with the serial in some parts. The MPI CPR implementation does it's specific inverse computation by hand instead of calling the backend inverse method, and also the static_assert regarding backend types fails with the MPI CPR while it works with serial CPR. Is the MPI CPR version compatible with static_matrix value types? I suppose no but just wanted to make sure.

I don't touch the CPR code very often, so could you please point me at specific code you are talking about here? It does look like serial CPR was updated more recently than mpi one. It also looks like the MPI version only supports scalar matrices.