awslabs/palace

Nedelec elements fail to converge for OpenMP threads greater than 1

Closed this issue · 1 comments

The rings example when run with nt >= 2 fails in the first linear solve, diverging at 100 iterations with a KSP norm of O(1e1). For -nt 1 converges in 47 iterations. Running with MGMaxLevels results in 75 KSP iterations, for -nt 1 and -nt 6.

The spheres example solves in 9 iterations for -nt 6 and -nt 1, suggesting the issue is restricted to nedelec elements.

The cavity example for tet -nt 1 takes 9 iterations, -nt 6 10 iterations, for hex 5 and 5. Suggests not always an issue, but the slight increase in number of iterations suggests it might just not be becoming an issue fast enough on this simpler problem.

The cpw_wave_uniform with -nt 1 takes 23 KSP iterations, with -nt 6 doesn't converge in 200 iterations. cpw_lumped_uniform doesn't converge with -nt 6 and multigrid, does converge without multigrid.

This suggests the issue is:

  • Restricted to Nedelec elements
  • Related to R and P in the multigrid transfer process.
  • Plausibly related to internal boundaries (might just be that these make the system stiffer).

The above was tested utilizing -framework Accelerate on a Mac M1. Building from scratch using the armpl instead, this appears to resolve, the number of iterations for rings increases slightly with 6 threads, but does converge, and for the cpw_lumped converges in 23 exactly. Closing this as it appears the issue is related more related to incorrectly configuring the BLAS rather than Palace.