CUDA implementation of parallel domain decomposition methods for preconditioning iterative solvers to the Helmholtz equation.
We consider the Helmholtz equation with zero-order absorbing boundary conditions:
$$-\Delta U - \omega^2 \alpha^2(x) U=f, \qquad \forall x\in\Omega,$$
$$\partial_{\mathbf{n}} U + i \alpha(x) \omega U=0, \qquad \forall x\in\partial\Omega.$$
Here $\Omega$ is an open and simply connected subset of $\mathbb{R}^2$, $\alpha$ is a real valued positive function, and $f$ is a real valued function.
We solve the Helmholtz equation via the finite element method.
Let $U = u + i v$, the weak formulation is for all $\phi\in H^1(\Omega)$
$$\begin{align*}
(\nabla u, \nabla \phi) - \omega^2 (\alpha^2 u, \phi) - \omega \langle \alpha v,\phi \rangle &= (f,\phi), \\
(\nabla v, \nabla \phi) - \omega^2 (\alpha^2 v, \phi) + \omega \langle \alpha u,\phi \rangle &= 0.
\end{align*}$$
Here
$$(f, g) = \int_\Omega f g \, dx, \qquad \langle f, g \rangle = \int_{\partial\Omega} f g \, ds.$$
Let $\{\phi_i\}_{i=1}^n$ be the FE basis functions, and define the matrices
$$S_{ij} = (\nabla \phi_i, \nabla \phi_j), \quad M_{ij} = (\alpha\phi_i, \phi_j), \quad H_{ij} = \langle \phi_i, \phi_j \rangle.$$
Let $F_i = (f, \phi_i)$. Then the solutions $u_h, v_h$ are given by
$$u_h = \sum_{i=1}^n \hat{u}_i \phi_i, \qquad v_h = \sum_{i=1}^n \hat{v}_i \phi_i.$$
With the coefficients $\hat{u}, \hat{v}$ satisfying
$$\begin{pmatrix}
S-\omega^2M & -\omega H \\
-\omega H & \omega^2M-S
\end{pmatrix} \begin{pmatrix}
\hat{u} \\
\hat{v}
\end{pmatrix} = \begin{pmatrix}
F \\
0
\end{pmatrix}.$$
Krylov space methods for solving the Helmholtz equation are known to converge very slowly.
In addition, for high frequency problems, we must take $n$ very large, so GMRES must be restarted every $m \ll n$ Arnoldi iterations as the cost scales like $O(nm^2)$ per iteration.
With restarts, however, convergence is slower still, so a good preconditioner is essential in order to solve the Helmholtz equation efficiently.
Here we consider a domain decomposition method.