Convergence Issues with 2D Laser-Ion Acceleration Example with Implicit Solver

Question

Convergence Issues with 2D Laser-Ion Acceleration Example with Implicit Solver

n01r opened this issue 5 months ago · 21 comments

I tried to convert the 2D laser-ion acceleration from planar targets example to using the implicit solver for an initial comparison before I would run larger cases.

I went for the the PS-JFNK solver from this open pull request #4736.
Unfortunately, I could not get the example to run to complete 200 fs in simulation time.
I got past the 100 steps that are the example default by relaxing the tolerances of the solvers from 1.0e-8 or 1.0e-12 to 1.0e-5.

Answer 1 · 2024-04-16T19:57:16.000Z

I will upload the input and output files to this issue.

Answer 2 · 2024-04-16T19:59:30.000Z

This is a case where I had a CFL of 1.001 but since it is still basically the checked-in example the resolution is quite bad. Therefore $\omega_\mathrm{p} \cdot \mathrm{d}t$ is very large and that might make the Newton solver diverge.

WarpX.o24283134.txt
WarpX.e24283134.txt
perlmutter_cpu.sbatch.txt
output_24283134.txt
inputs_2d.txt
Backtrace.0.0.txt

Answer 3 · 2024-04-16T20:41:48.000Z

The new game plan is to start from https://github.com/ECP-WarpX/WarpX/blob/d771762e664d2005c0d47414471052a58d564318/Examples/Tests/Implicit/inputs_vandb_jfnk_2d and

increase the plasma temperature to 100 keV
implement a vacuum-plasma interface (slab)
add laser pulse and go back to 100 eV (and lower density?)
increase $\Delta t$ to check how convergence behaves

@JustinRayAngus, do you have any more ideas or parameter ranges that I should test or avoid?

Answer 4 · 2024-04-17T04:12:59.000Z

the JFNK solver is not setup to use preconditioning. Thus, the algorithm is currently limited in how large wpdt can be.
How large is wpdt? Is the linear solver converging? How many iterations for the linear solver?

Answer 5 · 2024-04-18T00:48:31.000Z

That's fair :)
For our ion example it's really bad so that it runs locally. It is $\omega_\mathrm{p} \mathrm{d}t = 1.4$.

I also ran it with a few other cases and one had $\omega_\mathrm{p} \mathrm{d}t = 0.4$, I think. But Perlmutter is down today and I cannot look.

The GMRES linear solver was usually converging within 2 to 10 steps, unless it didn't for the last step and then it took over 300 iterations and crashed.

I will test it more carefully, starting from something that works and has good $\omega_\mathrm{p} \mathrm{d}t$ as mentioned above.

Answer 6 · 2024-04-18T01:25:25.000Z

I'm not sure I understand what you mean by "For our ion example it's really bad so that it runs locally".

wpdt = 1.4 should work just fine even without preconditioning, it will just be slow.

In general, I would limit things to wpdt and cfl for light waves both order unity for now. Can't efficiently push to larger values until we make some further progress on the implicit solver implementation.

2-10 steps for GMRES to converge is fine. If it jumped up to 300 abrubtly, then it seems like something else went wrong. This is where having in-situ diagnostic for the solvers would be useful. We don't have that yet.

I'm at a conference this week. I'm happy to discuss next week sometime if needed.

Answer 7 · 2024-04-18T17:39:54.000Z

Sorry, what I meant was that we ship this example knowing full well that $\omega_\mathrm{p}\mathrm{d}t = 1.4$ is a terrible resolution for modeling laser-solid interaction in explicit PIC and we state that users should increase resolution to get more meaningful results if they have the resources.

But that is interesting that you are saying that it should have worked. I will run my next tests and then maybe we can discuss. We have a KISMET meeting slot at 1 PM PDT next Tuesday which is usually reserved for thrust 3 but I'm also happy to join one that I usually am not a part of if that is more convenient. :)

Answer 8 · 2024-04-24T20:54:55.000Z

@n01r @JustinRayAngus
I think that a key point here is that even at the first timestep, the Newton solver is diverging instead of converging:

STEP 1 starts ...
Newton: iteration =   0, norm = 8.17221e+12 (abs.), 1.00000e+00 (rel.)
GMRES: iter = 0, residual = 8.172207803e+12, 1 (rel.)
GMRES: iter = 1, residual = 6.162510298e+12, 0.7540814485 (rel.)
GMRES: iter = 2, residual = 5.222627849e+11, 0.06390718365 (rel.)
GMRES: iter = 3, residual = 5.783763048e+10, 0.007077356802 (rel.)
GMRES: iter = 4, residual = 6814656611, 0.0008338819539 (rel.)
GMRES: iter = 5, residual = 1080164271, 0.0001321753309 (rel.)
GMRES: iter = 6, residual = 165101477.9, 2.020279977e-05 (rel.)
GMRES: Solve Time = 1.049196633
Newton: iteration =   1, norm = 4.22502e+13 (abs.), 5.16998e+00 (rel.)
GMRES: iter = 0, residual = 4.225018675e+13, 1 (rel.)
GMRES: iter = 1, residual = 943288776.1, 2.232626288e-05 (rel.)
GMRES: Solve Time = 0.157849184
Newton: iteration =   2, norm = 5.29848e+13 (abs.), 6.48354e+00 (rel.)
GMRES: iter = 0, residual = 5.298482981e+13, 1 (rel.)
GMRES: iter = 1, residual = 998114771.4, 1.883774611e-05 (rel.)
GMRES: Solve Time = 0.157490169
Newton: iteration =   3, norm = 6.37195e+13 (abs.), 7.79709e+00 (rel.)
GMRES: iter = 0, residual = 6.371946322e+13, 1 (rel.)
GMRES: iter = 1, residual = 998253851.4, 1.566638827e-05 (rel.)
GMRES: Solve Time = 0.156699158
Newton: iteration =   4, norm = 7.44541e+13 (abs.), 9.11065e+00 (rel.)
GMRES: iter = 0, residual = 7.445408698e+13, 1 (rel.)
GMRES: iter = 1, residual = 998432406.1, 1.341004163e-05 (rel.)
GMRES: Solve Time = 0.15714457
Newton: iteration =   5, norm = 8.51887e+13 (abs.), 1.04242e+01 (rel.)
GMRES: iter = 0, residual = 8.518870109e+13, 1 (rel.)
GMRES: iter = 1, residual = 998656311.5, 1.172287285e-05 (rel.)
GMRES: Solve Time = 0.157046251

(This is from output_24283134.txt that Marco posted above.)

Note the increase in the error of the Newton solve:

Newton: iteration =   0, norm = 8.17221e+12 (abs.), 1.00000e+00 (rel.)
Newton: iteration =   1, norm = 4.22502e+13 (abs.), 5.16998e+00 (rel.)
Newton: iteration =   2, norm = 5.29848e+13 (abs.), 6.48354e+00 (rel.)
...

@JustinRayAngus Could you confirm that this is unexpected, considering that $\omega_p \Delta t$ is "only" 1.4?

Answer 9 · 2024-04-24T21:55:35.000Z

Yes. This behavior is not expected. I have some ideas of what it could be. I had similar issues originally. It may have to do with the large scale (norm is 1e12$ for the problem. There is some inherent assumption about scales not being too large or too small for the finite difference approximation to the Jacobian. We should discuss.

Answer 10 · 2024-04-24T22:47:02.000Z

Hi @JustinRayAngus, should we schedule a meeting for this?

What determines the scale of the problem?

Answer 11 · 2024-04-24T22:51:59.000Z

@RemiLehe, @ax3l and I just talked about how we can maybe talk about it in the Friday meeting.

Answer 12 · 2024-04-25T15:08:40.000Z

We could talk about it Friday at the regular meeting time.

I need to know more about the problem and setup. Not sure what is going on. I no longer think it's the scale since the scale here is about the same as it is for the regression test.

I should comment that I tried using a large time step and found that GMRES always segfaults when the iterations exceed 30. @WeiqunZhang

Suggested steps in mean time:

Run for 10 steps and look at warnings after. Are the particles converging?
Run with semi-implicit and use time step that resolves light waves
Does it converge for smaller values of Dt?

Answer 13 · 2024-04-25T15:10:01.000Z

To answer your previous question, the scale of the problem is the L2 norm of Ampere's law.

Answer 14 · 2024-04-25T15:10:51.000Z

Are their particles going in/out of a physical boundary for this problem?

Answer 15 · 2024-04-25T15:25:12.000Z

I should comment that I tried using a large time step and found that GMRES always segfaults when the iterations exceed 30. @WeiqunZhang

30 is when restart begins. I will look into this. In my testing of GMRES with MLMG, restarting seems to work.

Answer 16 · 2024-04-25T22:10:09.000Z

Are their particles going in/out of a physical boundary for this problem?

The specific example has a 2D plasma slab that touches the boundaries. Particle boundaries are set to absorbing. So in principle particles can leave the box. Ions are at rest but the electrons have an initial temperature. So it is possible that a particle already leaves during the first time step.

Answer 17 · 2024-04-26T18:34:17.000Z

There is a restart issue in linear operator class for GMRES. If I set GMRES's restart length to 4, I get

GMRES: iter = 0, residual = 8.126534683e+12, 1 (rel.)
GMRES: iter = 1, residual = 6.101812917e+12, 0.7508505354 (rel.)
GMRES: iter = 2, residual = 5.206856998e+11, 0.06407229159 (rel.)
GMRES: iter = 3, residual = 5.739630096e+10, 0.007062826064 (rel.)
GMRES: iter = 4, residual = 6795482960, 0.0008362091871 (rel.)
GMRES: iter = 4, residual = 4.863134414e+12, 0.598426587 (rel.)
GMRES: iter = 5, residual = 1.40034024e+12, 0.17231702 (rel.)
...

Although it did not segfault like @JustinRayAngus mentioned, after restart the resdual went back up a lot.

I did a test using amrex/Tests/LinearSolvers/CurlCurl/, and I got

GMRES: iter = 0, residual = 33321.69994, 1 (rel.)
GMRES: iter = 1, residual = 4207.924549, 0.126281809 (rel.)
GMRES: iter = 2, residual = 48.69181953, 0.00146126457 (rel.)
GMRES: iter = 3, residual = 0.2033884438, 6.103783544e-06 (rel.)
GMRES: iter = 4, residual = 0.002315450854, 6.948777697e-08 (rel.)
GMRES: iter = 4, residual = 0.002315450852, 6.94877769e-08 (rel.)
GMRES: iter = 5, residual = 7.67603372e-05, 2.303614081e-09 (rel.)
...

Here we see that restart worked. So I think the issue is probably in the JacobianFuncionMF or what it uses.

Answer 18 · 2024-04-26T20:16:38.000Z

I have begun to run small tests, too (applying one after the other, running separately).
I started from inputs_vandb_jfnk_2d and made changes towards a setup more similar to mine.
So far everything still converged and I could not reproduce the diverging solves.

Raise electron temp to 100 keV ✔️ - converged
Add absorbing boundaries ✔️ - converged
Vacuum-plasma interface (slab) ✔️ - converged
Add laser pulse and go back to 100 eV (and lower density?)
Increase deltaT

Answer 19 · 2024-05-01T23:26:34.000Z

I recently ran the laser-ion example with the implicit solver and removed the laser pulse to test the hypothesis that the diverging solves are coming from the laser particles depositing current which the implicit solver does not account for.

In this test, the implicit solver successfully converged. ✔️

Here are the input and output files:

Answer 20 · 2024-05-02T03:57:08.000Z

@WeiqunZhang
I fixed the restart issue in a commit a few days ago. Pull and try again.
STEP 7 starts ...
Newton: iteration = 0, norm = 2.13996e+11 (abs.), 1.00000e+00 (rel.)
GMRES: iter = 0, residual = 2.139962625e+11, 1 (rel.)
GMRES: iter = 1, residual = 1.512678965e+10, 0.07068716749 (rel.)
GMRES: iter = 2, residual = 1042330173, 0.004870786811 (rel.)
GMRES: iter = 3, residual = 70643579.59, 0.0003301159505 (rel.)
GMRES: iter = 4, residual = 4083515.776, 1.908218269e-05 (rel.)
GMRES: iter = 4, residual = 4083447.449, 1.90818634e-05 (rel.)
GMRES: iter = 5, residual = 379243.6816, 1.772197688e-06 (rel.)

Answer 21 · 2024-05-02T13:36:41.000Z

OK, I think I fixed the lack of convergence of the nonlinear solver with this commit:
b28a845