/multigrid

Multigrid experiments with the deal.II library

Primary LanguageCudaGNU Lesser General Public License v2.1LGPL-2.1

Multigrid experiments with the deal.II library

This repository contains a collection of example programs demonstrating matrix-free multigrid solvers with the deal.II finite element library. The code in this collection is based on an extension of the step-37 tutorial program of deal.II.

Set of experiments

This repository contains the following experiments:

  • poisson_cube: Solves the Poisson equation on a cube, running both a full multigrid cycle and a conjugate gradient solver preconditioned by a V-cycle.

  • poisson_shell: Solves the Poisson equation with strongly varying coefficient on a shell, running both a full multigrid cycle and a conjugate gradient solver preconditioned by a V-cycle. Note that Kronbichler and Wall (2018) report a relatively low number of iterations on a similar example compared to what is observed here, which is due to a particular analytic solution.

  • poisson_l: Solves the Poisson equation on an L-shaped domain with a corner singularity using adaptive meshes.

  • minimal_surface: Solves the nonlinear minimal surface equation, which is a particular variable-coefficient Poisson equation

Run an experiment

To run an experiment, you need to first configure the test case against a deal.II installation. To obtain a compiled version of deal.II, see at the bottom of this page. Go into a directory, e.g. poisson_cube, and run the following:

cmake -D DEAL_II_DIR=/path/to/dealii/install -D CMAKE_BUILD_TYPE=Release .

The build type defines that we intend to run the program in release mode (with compiler optimizations and without debugging information). Whenever you do development of the program, it is recommended to use a Debug build (-D CMAKE_BUILD_TYPE=Debug) because deal.II has a large number of assertions that help to track down errors.

Running the program e.g. on a 12-core Intel Xeon E5-2687W v4 (Broadwell), the following output can be obtained for the cube example:

$ mpirun -n 12 -bind-to core ./program 4 150000000 2 2 2 square
Settings of parameters:
Polynomial degree:              4
Maximum size:                   150000000
Number of MG cycles in V-cycle: 2
Number of pre-smoother iters:   2
Number of post-smoother iters:  2
Use doubling mesh:              0

Testing FE_Q<3>(4)
Cycle 0
Number of degrees of freedom: 125 = (1 x 4 + 1)^3
Total setup time:      0.00313138s
Time compute rhs:      0.00776478
Time initial smoother: 4.5829e-05
Memory stats [MB]: 126.004 [p2] 126.341 128.035 [p8]
Time solve   (CPU/wall)    0.002141s/0.00853511s
Time solve   (CPU/wall)    9.8e-05s/0.000154858s
Time solve   (CPU/wall)    0.0001s/0.00015684s
Time solve   (CPU/wall)    0.000152s/0.000157359s
Time solve   (CPU/wall)    0.000152s/0.000155314s
All solver time 0.0091177 [p0] 0.00919578 0.00922101 [p2]
Coarse solver 12 times: 0.00920122 tot prec 0
level  smoother    mg_mv     mg_vec    restrict  prolongate  inhomBC
Coarse solver 3 times: 1.6763e-05 tot prec 0.00040683
level  smoother    mg_mv     mg_vec    restrict  prolongate  inhomBC
matvec time dp 3.18e-07 [p9] 1.3557e-06 8.1292e-06 [p11] DoFs/s: 1.5377e+07
matvec time dp 3.1774e-07 [p9] 1.3338e-06 7.9705e-06 [p11] DoFs/s: 1.5683e+07
matvec time dp 3.1548e-07 [p9] 1.3489e-06 7.9662e-06 [p11] DoFs/s: 1.5691e+07
matvec time dp 3.1542e-07 [p9] 1.3236e-06 7.926e-06 [p11] DoFs/s: 1.5771e+07
matvec time dp 3.148e-07 [p2] 1.2615e-06 7.3561e-06 [p11] DoFs/s: 1.6993e+07
Best timings for ndof = 125   mv 7.3561e-06    mv smooth 7.5114e-06   mg 0.00014986
L2 error with ndof = 125  1.1797  with CG 1.1797

...


Cycle 23
Number of degrees of freedom: 135005697 = (128 x 4 + 1)^3
Total setup time:      11.333s
Time compute rhs:      8.977
Time initial smoother: 5.9028
Memory stats [MB]: 2347.3 [p3] 2375.3 2428.2 [p1]
Time solve   (CPU/wall)    8.0494s/8.3143s
Time solve   (CPU/wall)    3.6149s/3.6178s
Time solve   (CPU/wall)    3.6129s/3.6138s
Time solve   (CPU/wall)    3.6099s/3.6128s
Time solve   (CPU/wall)    3.6154s/3.6163s
error start         level 1: 1.3006
residual norm start level 1: 27.4
residual norm end   level 1: 0.32679
error end           level 1: 0.17372
error start         level 2: 0.17468
residual norm start level 2: 9.5456
residual norm end   level 2: 0.24821
error end           level 2: 0.011664
error start         level 3: 0.015083
residual norm start level 3: 0.73235
residual norm end   level 3: 0.01274
error end           level 3: 0.00040369
error start         level 4: 0.00051142
residual norm start level 4: 0.03154
residual norm end   level 4: 0.00040747
error end           level 4: 1.268e-05
error start         level 5: 1.6539e-05
residual norm start level 5: 0.0013008
residual norm end   level 5: 2.0988e-05
error end           level 5: 4.2626e-07
error start         level 6: 5.4339e-07
residual norm start level 6: 5.5035e-05
residual norm end   level 6: 1.0401e-06
error end           level 6: 1.3769e-08
error start         level 7: 1.7362e-08
residual norm start level 7: 2.4024e-06
residual norm end   level 7: 4.7265e-08
error end           level 7: 4.342e-10
All solver time 22.774 [p5] 22.775 22.775 [p1]
Coarse solver 96 times: 0.02376 tot prec 0
level  smoother    mg_mv     mg_vec    restrict  prolongate  inhomBC
L1     0.003677    0.0007593 0.0002591 0.0008119 0.0009744   7.357e-05
L2     0.004806    0.00097   0.0002615 0.001109  0.0009335   2.076e-05
L3     0.01286     0.002621  0.0004627 0.001674  0.001217    3.812e-05
L4     0.06815     0.01343   0.001536  0.004285  0.004828    0.0001227
L5     0.5057      0.07605   0.02739   0.02451   0.03663     0.0005227
L6     3.209       0.432     0.2977    0.1762    0.3602      0.002975
L7     14.74       2.066     1.749     0.6926    1.824       0.01259
Coarse solver 8 times: 0.00024006 tot prec 11.166
level  smoother    mg_mv     mg_vec    restrict  prolongate  inhomBC
L1     0.003804    6.966e-05 1.968e-05 0.0003543 9.869e-05   0
L2     0.003023    0.000139  1.973e-05 0.0002264 0.000119    0
L3     0.003226    0.0003972 2.809e-05 0.0004738 0.0001548   0
L4     0.0183      0.002385  0.000139  0.002703  0.0007166   0
L5     0.1235      0.02776   0.002822  0.008956  0.00642     0
L6     0.8902      0.1219    0.04046   0.0897    0.08435     0
L7     6.732       0.7539    1.08      0.5136    0.6526      0
matvec time dp 0.15706 [p10] 0.15726 0.15748 [p0] DoFs/s: 8.5731e+08
matvec time dp 0.15469 [p8] 0.15473 0.15475 [p1] DoFs/s: 8.724e+08
matvec time dp 0.15448 [p8] 0.15452 0.15455 [p4] DoFs/s: 8.7355e+08
matvec time dp 0.15477 [p11] 0.15482 0.15486 [p1] DoFs/s: 8.7177e+08
matvec time dp 0.15488 [p8] 0.1549 0.15494 [p1] DoFs/s: 8.7135e+08
Best timings for ndof = 135005697   mv 0.15455    mv smooth 0.092618   mg 3.6128
L2 error with ndof = 135005697  4.342e-10  with CG 4.2068e-10

Cycle 24
Number of degrees of freedom: 263374721 = (160 x 4 + 1)^3
Total setup time:      19.549s
Max size reached, terminating.

 cells    dofs    mv_outer  mv_inner  reduction  fmg_L2error   fmg_time    cg_L2error    cg_time  cg_its cg_reduction
1       125       7.356e-06 7.511e-06 1.000e+00 1.180e+00 -    1.499e-04 1.180e+00 -    9.667e-04 3      1.262e-04
8       729       1.264e-05 6.969e-06 1.092e-01 1.737e-01 2.76 7.675e-04 1.725e-01 2.77 2.569e-03 8      5.677e-02
27      2197      9.923e-06 8.709e-06 1.000e+00 4.102e-02 3.56 6.810e-04 4.102e-02 3.54 1.234e-03 3      3.157e-04
64      4913      1.264e-05 9.717e-06 1.613e-01 1.166e-02 4.37 8.097e-04 1.027e-02 4.81 2.154e-03 8      6.789e-02
125     9261      1.882e-05 1.528e-05 1.000e+00 3.902e-03 4.91 1.971e-03 3.902e-03 4.34 3.241e-03 3      4.421e-04
216     15625     3.129e-05 2.267e-05 1.818e-01 2.164e-03 3.23 1.813e-03 1.145e-03 6.73 4.744e-03 8      6.134e-02
343     24389     3.721e-05 2.675e-05 1.000e+00 7.057e-04 7.27 4.772e-03 7.057e-04 3.14 7.448e-03 3      5.026e-04
512     35937     5.361e-05 3.688e-05 1.319e-01 4.037e-04 4.18 2.032e-03 3.822e-04 4.59 5.333e-03 8      6.689e-02
1000    68921     1.007e-04 6.069e-05 1.202e-01 1.246e-04 5.27 4.994e-03 1.316e-04 4.78 1.297e-02 8      6.575e-02
1728    117649    1.611e-04 1.103e-04 1.250e-01 5.413e-05 4.57 4.690e-03 5.423e-05 4.86 1.251e-02 8      6.828e-02
2744    185193    2.365e-04 1.588e-04 1.142e-01 2.418e-05 5.23 1.249e-02 2.546e-05 4.90 3.190e-02 8      6.722e-02
4096    274625    3.513e-04 2.322e-04 1.137e-01 1.268e-05 4.83 7.570e-03 1.319e-05 4.93 2.336e-02 8      6.948e-02
8000    531441    5.852e-04 3.784e-04 1.163e-01 4.252e-06 4.90 1.565e-02 4.370e-06 4.95 4.683e-02 8      6.989e-02
13824   912673    9.802e-04 6.299e-04 1.196e-01 1.749e-06 4.87 2.106e-02 1.767e-06 4.97 7.399e-02 8      7.029e-02
21952   1442897   1.672e-03 9.588e-04 1.238e-01 8.213e-07 4.90 4.311e-02 8.207e-07 4.98 1.383e-01 8      6.970e-02
32768   2146689   2.598e-03 1.489e-03 1.270e-01 4.263e-07 4.91 5.220e-02 4.220e-07 4.98 2.027e-01 8      6.985e-02
64000   4173281   5.452e-03 3.194e-03 1.318e-01 1.418e-07 4.93 1.151e-01 1.387e-07 4.99 4.449e-01 8      6.957e-02
110592  7189057   8.965e-03 5.382e-03 1.346e-01 5.748e-08 4.95 1.906e-01 5.582e-08 4.99 7.639e-01 8      6.917e-02
175616  11390625  1.413e-02 8.487e-03 1.363e-01 2.675e-08 4.96 3.182e-01 2.585e-08 4.99 1.232e+00 8      6.894e-02
262144  16974593  2.054e-02 1.245e-02 1.375e-01 1.377e-08 4.97 4.564e-01 1.327e-08 5.00 1.806e+00 8      6.863e-02
512000  33076161  4.044e-02 2.408e-02 1.389e-01 4.533e-09 4.98 9.125e-01 4.351e-09 5.00 3.555e+00 8      6.839e-02
884736  57066625  6.685e-02 4.031e-02 1.396e-01 1.826e-09 4.99 1.538e+00 1.750e-09 5.00 6.023e+00 8      6.826e-02
1404928 90518849  1.054e-01 6.306e-02 1.400e-01 8.459e-10 4.99 2.452e+00 8.102e-10 4.99 9.551e+00 8      6.807e-02
2097152 135005697 1.545e-01 9.262e-02 1.403e-01 4.342e-10 4.99 3.613e+00 4.207e-10 4.91 1.649e+01 8      6.799e-02

The final table displays a lot of information, such as the time for matrix-vector products within the conjugate gradient method and correction (mv_outer, done in double precision with the default settings) or for the matrix-vector product within the smoother (mv_inner, done in single precision with the default settings). Furthermore, the table contains numbers about the convergence rate of the multigrid iteration and the L2 discretization error after the multigrid solution, both for the full multigrid cycle (FMG) and the conjugate gradient method.

GPU support

The above experiments have also been run a GPU, using the preliminary code available here: https://github.com/kalj/dealii-cuda/. The deal.II library has preliminary support for GPU computations based on that code as well as new original developments, but the feature is still work in progress. The above examples are mostly compatible with the CUDA infrastructure and will be added to the list soon. Please open an issue for a particular setup.

Related manuscripts

  • M. Kronbichler and W. A. Wall (2018). A performance comparison of continuous and discontinuous Galerkin methods with fast multigrid solvers. SIAM J. Sci. Comput., 40(5):A3423-A3448. Relates to poisson_cube and poisson_shell.

  • K. Ljungkvist and M. Kronbichler (2017). Multigrid for Matrix-Free Finite Element Computations on Graphics Processors. Technical Report 2017-006, Department of Information Technology, Uppsala University. Relates to poisson_shell, poisson_l, minimal_surface.

Installation of the deal.II package

This suite consists of small programs that run against a compiled deal.II finite element library that in turn needs a C++ compiler adhering to the C++11 standard, an MPI implementation, and cmake. The following software packages are needed:

  • deal.II, using at least version 9.0.0, see www.dealii.org. deal.II must be configured to also include the following external packages (no direct access to these packages is necessary, except for the interface through deal.II):

  • MPI

  • p4est for providing parallel adaptive mesh management on forests of quad-trees (2D) or oct-trees (3D). For obtaining p4est, see http://www.p4est.org. p4est of at least version 2.0 is needed for running this project. Installation of p4est can be done via a script provided by deal.II (the first argument refers to the path of the deal.II source directory as obtained by cloning from github.com/dealii/dealii, the last argument specifies the desired installation directory for p4est):

/path/to/dealii/doc/external-libs/p4est-setup.sh p4est-1.1.tar.gz /path/to/p4est/install

Given these dependencies, the configuration of deal.II in a subfolder /path/to/dealii/build can be done through the following commands:

cmake \
    -D CMAKE_CXX_FLAGS="-march=native" \
    -D CMAKE_INSTALL_PREFIX="/path/to/dealii/install/" \
    -D DEAL_II_WITH_MPI="ON" \
    -D DEAL_II_WITH_LAPACK="ON" \
    -D DEAL_II_WITH_P4EST="ON" \
    -D P4EST_DIR="/path/to/p4est/install/" \
    ../deal.II

Since the matrix-free algorithms in deal.II make intensive use of advanced processor instruction sets (e.g. vectorization through AVX/AVX-512 or similar), it is recommended to enable processor-specific optimizations (second line, -march=native). The path on the third line specifies the desired installation directory of deal.II, and the last line points to the location of the source code of deal.II relative to the folder where the cmake script is run. After configuration, run

make -j8
make install

to compile deal.II and install it in the given directory. After installation, the deal.II sources and the build folder are no longer necessary (unless you find bugs in deal.II and need to modify that code). It is also possible to run against a deal.II build folder, say /path/to/dealii/build, that then combines with the include folders in the source directory.

Literature

The matrix-free infrastructure in deal.II is described in the following papers: