LLNL/sundials

Crash when using dense linear solver and Netwon nonlinear solver with relative large number of equations

Closed this issue · 3 comments

Hi,

I am using a CVODE solver, which I initialize with CV_ADAMS, as well as with dense linear solver and default Netwon nonlinear solver (I do not invoke CvSetNonlinSolver).

The initialization passes successfully, but then at the invokation of CVode I get a crash and the backtrace in the GDB is:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffb5043644 in N_VSetArrayPointer () from /home/dima/.conda/envs/um02-open-interfaces/lib/libsundials_cvode.so.6
(gdb) bt
#0  0x00007fffb5043644 in N_VSetArrayPointer ()
   from /home/dima/.conda/envs/um02-open-interfaces/lib/libsundials_cvode.so.6
#1  0x00007fffb503910a in cvLsDenseDQJac ()
   from /home/dima/.conda/envs/um02-open-interfaces/lib/libsundials_cvode.so.6
#2  0x00007fffb503a34f in cvLsLinSys ()
   from /home/dima/.conda/envs/um02-open-interfaces/lib/libsundials_cvode.so.6
#3  0x00007fffb50386f7 in cvLsSetup ()
   from /home/dima/.conda/envs/um02-open-interfaces/lib/libsundials_cvode.so.6
#4  0x00007fffb503b714 in cvNlsLSetup ()
   from /home/dima/.conda/envs/um02-open-interfaces/lib/libsundials_cvode.so.6
#5  0x00007fffb5054b1d in SUNNonlinSolSolve_Newton ()
   from /home/dima/.conda/envs/um02-open-interfaces/lib/libsundials_cvode.so.6
#6  0x00007fffb502fff3 in CVode ()
   from /home/dima/.conda/envs/um02-open-interfaces/lib/libsundials_cvode.so.6

I actually use CVode from Python with my own bindings, so it is difficult for me to provide a minimal example.
However, I get the same crash when I use (scikits.odes)[https://github.com/bmcage/odes] bindings.

  • When I switch to fixedpoint option in scikit.odes, the code works
  • The problem must be somehow related to the number of equations: it is 524288, as I solve a 2D reaction diffusion system
  • Unit tests with small number of equations (1, 2, 4) pass successfully

Am I understanding something incorrectly? I was initially using SPGMR solver, however, I have noticed that scikit.odes uses Dense linear solver by default, and it works much faster for these particular computations, however, one needs to switch to the Fixed-point nonlinear solver.

Anyway, there is no good error message, just a segmentation fault, so I have decided to report a bug 😄

It sounds to me like you are running out of memory, which causes the segmentation fault.

With 524288 equations, the dense matrix will have 524288*524288 > 274 billion entries. Assuming you are using double-precision arithmetic, a single copy of this matrix will require over 2 terabytes of memory. CVODE should require two matrices, resulting in over 4 TB of memory for just these matrices.

When you run with the fixed-point nonlinear solver, there is no Jacobian matrix to store, which is why it can run correctly.

I recommend that you try using either a sparse Jacobian and linear solver, or an iterative linear solver to go with the Newton nonlinear solver. The iterative linear solvers do not need to store a Jacobian matrix at all. Alternately, the Jacobian for your 2D reaction diffusion problem is probably incredibly sparse, so a sparse storage format and solver could also work.

Yes, I also thought that the matrix is huge and this is the problem.

What is unclear is that the actual creation of the matrix works, and CVodeSetLinearSolver function also is successful. The crash happens when actual integration happens (invocation of Cvode).

I have reported this as the code crashes with "Segmentation fault" which probably could be improved by providing a better error message and graceful termination?

I have realized that I was not checking before the return value of the SUNDenseMatrix function.
When I have added the check for NULL , then yes, the program can be terminated immediately.