AMReX-Astro/Microphysics

Initializing pivot to zero in VODE causes out of boundary access in linpack functions

rporcu opened this issue · 7 comments

rporcu commented

When applying the following changes

diff --git a/integration/VODE/vode_dvstep.H b/integration/VODE/vode_dvstep.H
index 6a77bc94..1716c90e 100644
--- a/integration/VODE/vode_dvstep.H
+++ b/integration/VODE/vode_dvstep.H
@@ -133,6 +133,9 @@ int dvstep (BurnT& state, DvodeT& vstate)
     }
 
     Array1D<short, 1, int_neqs> pivot;
+    for (int i = 1; i <= int_neqs; ++i) {
+      pivot(i) = 0;
+    }
 
     // Compute the predicted values by effectively
     // multiplying the yh array by the Pascal triangle matrix.

and compiling in debug mode, I get the following out-of-boundary access error
0::Assertion 'i >= XLO && i <= XHI' failed, file "../../subprojects/amrex/Src/Base/AMReX_Array.H", line 227 !!!

I am running the following benchmark:
dx/dt = y; dy/dt = -x; x(0) = 1; y(0) = 0 with solution x = cos(t); y = -sin(t)
I commented out the calls to eos and clean_state functions as at this moment I am not interested in enforcing any EOS constraint, and I have set species_failure_tolerance = 1.e32; as I do not want to enforce the solution to be in [0,1]. All these changes are in my fork at branch oscillator_test

My guess is that as long as pivot array is deallocated and reallocated in the same memory area, the code runs fine. When that's not the case, the code could run into a segfault. The reason for this is that, when pivot gets reallocated in the same memory space as the previous integration step, it reads the old pivot values and that makes it work fine. When I reset those values to 0 before starting the new integration step (and that could be like allocating the array in a new memory address different than before) that triggers the out-of-boundary issue.

Does this happen in a CPU build? What compiler and compiler version did you use to produce the above error?

I tried building unit_test/armonic_oscillator, but I get a build error:

make: *** No rule to make target `extern_parameters.cpp', needed by `tmp_build_dir/o/3d.gnu.DEBUG.EXE/extern_parameters.o'.  Stop.

If I run re-run make, it gets around the above error.

However, when I run the executable, I don't see any errors:

Process 4402 launched: '/Users/benwibking/Microphysics-fork/unit_test/armonic_oscillator/main3d.gnu.DEBUG.ex' (arm64)
Initializing AMReX (22.03-767-g255d30f387cf)...
AMReX (22.03-767-g255d30f387cf) initialized
starting the single zone burn...
Maximum Time (s): 10
Elem (0): 1
Elem (1): 0
RHS at t = 0
Elem (0): 0
Elem (1): -1
------------------------------------
successful? 1
------------------------------------
new solution:
Elem (0):                  1
Elem (1): 1.000000527958948e-30
number of steps taken: 190000
AMReX (22.03-767-g255d30f387cf) finalized
Process 4402 exited with status = 0 (0x00000000)

Ok, if I manually apply your patch to your fork, then run, I get the error you report. The out of bounds access occurs here:

(lldb) f 10
frame #10: 0x00000001000122fc main3d.gnu.DEBUG.ex`armonic_oscillator() at linpack.H:21:22
   18  	    if (nm1 >= 1) {
   19  	        for (int k = 1; k <= nm1; ++k) {
   20  	            int l = pivot(k);
-> 21  	            Real t = b(l);
   22  	            if (l != k) {
   23  	                b(l) = b(k);
   24  	                b(k) = t;
(lldb) p l
(int) 0

I am not familiar with this part of the code, so perhaps @zingale can comment on it.

@zingale Should this issue be closed now, since #1456 was merged?

Let's wait for @rporcu to confirm it fixes his issue

rporcu commented

Thank you Ben, Max, and Mike.
I have just tested #1456 and I can confirm it fixes the issue I encountered.