Segfault for ictnunk=1 when case is converged
holm10 opened this issue · 7 comments
Describe the bug
When a case is converged (fnrm < ftol) and icntnunk is set to 1, the following exmain call causes a segfault
To Reproduce
- Navigate to a case that is converged
- Ensure the case is well converged by executing exmain until fnrm<ftol
- Set bbb.ictnunk=1
- Call exmain
- Segfault occurs
For the Slab_geometry test case in the pytests, the following commands will reproduce the bug:
from rd_slabH_in_w_h5 import *;bbb.exmain();bbb.exmain();bbb.icntnunk=1;bbb.exmain()
Expected behavior
Execute exmain, print initial fnrm, set iterm=1, return to prompt
Additional context
Has been encountered by several users at various points of using the code, most commonly associated with time-dependent runs where ictnunk is actively switched on and off. Bug does not appear to occur for basis versions.
Bug traced to odesetup.m conditional:
Line 6555 in cc15c0d
Segfault occurs due to xerrab call at:
Line 6556 in cc15c0d
Using this information, the expected behavior can be achieved by executing the following commands in the Slab_geometry test directory:
bbb.exmain();bbb.exmain();bbb.icntnunk=1;bbb.ijactot=2;bbb.exmain()
It appears the issue is with ijactot. Presumably, this flag exists to ensure there exists an Jacobian before trying to continue the code execution under the assumption that there is one using icntunk=1. However, it is not clear to me why ijactot=1 is insufficient for the routine to proceed.
On a general note, xerrab calls in the Python version seems to induce Segfaults rather than the expected behavior of printing the error message and returning to prompt. This should probably be fixed.
The following line(s) appears to explicitly prohibit ijactot from exeeding 1:
Line 6559 in cc15c0d
However, after the first exmain call, ijactot=2. It appears ijactot is only advaced in calc_jac:
Line 8427 in cc15c0d
During the first exmain-call there are 2 Jacobian updates, explaining why ijactot=2. This means exmain can be executed with icntnunk=1 only after calls when more than one Jacobian update was necessary to obtain convergence.
The suggested solution is to change the conditional on icntnunk=1 to ijactot<1. I don't see why more than one Jacobian evaluation is necessary for exmain calls with icntnunk=1.
One issue is if the Jacobian calculation is aborted mid-execution: in this case, isjactot=1 but the preconditioner is only partially evaluated, leading to errors if doing an exmain call with ictnunk=1 immediately after aborting, if the conditional is relaxed to ijactot<1. This is due to ijactot += 1 occurring at the beginning of subroutine jac_calc, rather than upon successful completion:
Line 8427 in cc15c0d
Moving the above line to aroundabouts here would solve this issue:
Line 8575 in cc15c0d
However, failing the conditional and causing a xerrab will still segfault, which is a different issue that needs to be resolved
Suggested fix implemented and tested. Closing as resolved.