Better error handling
jbusecke opened this issue ยท 13 comments
During development on #3 I noticed that errors in the fortran code seem to lead in an immediate crash of the kernel. We should investigate how to catch and display these (I was not able to see the relevant message in a notebook, but did in an ipython session).
To reproduce I entered this (wildly wrong units) into ipython
import mod_aerobulk_wrap as aero
import numpy as np
ni=3
nj=6
calgo='ncar'
zt=2
zu=5
sst = np.full([ni, nj], 30.0)
t_zt = np.full([ni, nj], 20.0)
hum_zt = np.full([ni, nj], 20.0) # watch units here...
u_zu=np.full([ni, nj], 2.0)
v_zu=np.full([ni, nj], 0.0)
slp =np.full([ni, nj], 1000.0)
niter=1
ql,qh,tau_x,tau_y,evap = aero.mod_aerobulk_wrapper.aerobulk_model_noskin(calgo,zt,zu,sst,t_zt,hum_zt,u_zu,v_zu,slp,niter)
Which gives:
===================================================================
----- AeroBulk_init -----
*** Bulk parameterization to be used => "ncar"
*** Cool-skin & Warm-layer schemes will NOT be used!
*** Computational domain shape: Ni x Nj = 00003 x 00006
*** Number of time records that will be treated: 1
*** Number of iterations in bulk algos: nb_iter = 1
*** Filling the `mask` array...
*** E R R O R :
the whole domain is masked!
check unit consistency of input fields
Might be relevant to this : https://numpy.org/doc/stable/f2py/signature-file.html#f2py-statements
Also a strong argument to have a pint unit check in this ๐น @TomNicholas
I just read this from https://numpy.org/doc/stable/f2py/signature-file.html#f2py-statements
callstatement <C-expr|multi-line block>
Replaces the F2PY generated call statement to Fortran/C function with <C-expr|multi-line block>. The wrapped Fortran/C function is available as (*f2py_func).To raise an exception, set f2py_success = 0 in <C-expr|multi-line block>.
It's pretty cryptic, but basically I think we need to add this to the signature file.
The way aerobulk handles errors is to call the ctl_stop
function, defined here: https://github.com/brodeau/aerobulk/blob/03b40b2d48cc5ad4dc41b256b00c54a4d40d124e/src/mod_const.f90#L238-L278
This code calls the fortran STOP command. So the general problem we need to solve is how to pass a fortran STOP command up to python as an exception.
This SO issue is relevant: https://stackoverflow.com/questions/18021081/stop-python-code-in-fortran-module-error-using-f2py
I would start by writing a test for the expected exception.
This page is also very helpful https://github.com/pearu/f2py/wiki/FAQ2ed. They suggest we will need to modify the fortran source code ๐
This page is also very helpful https://github.com/pearu/f2py/wiki/FAQ2ed. They suggest we will need to modify the fortran source code
But what is solution 2, haha.
I am going to try my best to summarize the short hack session with @rabernat and @TomNicholas today:
The ultimate goal was to somehow enable us to call a python function from fortran (callback function) which then raises an error without a crashing python kernel.
We mostly followed these instructions.
We added this codeblock to mod_aerobulk_wrap.f90
(see also #11):
subroutine foo2(a)
integer a
if (a.gt.10) then
print*, "Fortran: a>10 is bad, stopping here."
call f2pystop(10) ! f2pystop must raise an exception that will trigger a long jump to the corresponding wrapper function
stop (10)
end if
print*,"Fortran foo: a<=10 is good, continue."
end subroutine foo2
and the following code to mod_aerobulk_wrap.pyf
:
subroutine foo2(a) ! in :m2:foo.f90
integer :: a
intent (callback) f2pystop
external f2pystop
integer status
call f2pystop(status)
end subroutine foo2
recompiled the code with pip install -e .
and then ran:
from aerobulk.aerobulk.mod_aerobulk_wrap import mod_aerobulk_wrapper as w
class F2PYSTOP(Exception):
def __call__(self, status):
raise self.__class__(status)
w.foo2(11, F2PYSTOP())
Contrary to the example this resulted in a rather gnarly error:
Fortran: a>10 is bad, stopping here.
capi_return is NULL
Call-back cb_f2pystop_in_foo2__user__routines failed.
Fatal Python error: F2PySwapThreadLocalCallbackPtr: F2PySwapThreadLocalCallbackPtr: PyLong_AsVoidPtr failed
Python runtime state: initialized
Traceback (most recent call last):
File "<ipython-input-1-8ef22c6715ac>", line 3, in __call__
__main__.F2PYSTOP: 10
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, aerobulk.aerobulk.mod_aerobulk_wrap (total: 14)
[1] 15315 abort ipython
We experimented a bit further and changing
stop (10)
to
return (10)
in the first code block enabled us to run this without crashing the python kernel. @rabernat was hopeful that we might be able to do something with the return values, but we need to continue working on this.
I found this, which I believe is the prequel to our source used above.
- I don't think that you need to deal with longjmp in Fortran. You can replace
Fortran STOP statement with a call to C function that will deal with the
longjmp stuff.
It seems like some more brute force approaches (e.g. replacing STOP statements in all fortran files) as part of the build could be considered if we find a way to use return?
Notes from our hack on 2022/5/4:
- We managed to both make the call to the main function threadsafe and raise an error without python crashing by injecting c code like this to the signature file:
! -*- f90 -*-
! Note: the context of this file is case sensitive.
python module RANGE_CHECK ! in
interface ! in :RANGE_CHECK
module range_mod ! in :RANGE_CHECK:range_check.f90
subroutine range_check(a) ! in :RANGE_CHECK:range_check.f90:range_mod
integer :: a
!intent (callback) f2pystop
!external f2pystop
!call f2pystop(a)
!threadsafe
callstatement {{ &
Py_BEGIN_ALLOW_THREADS &
(*f2py_func)(&a); &
Py_END_ALLOW_THREADS &
PyErr_SetString(PyExc_ValueError, "Oooops"); &
}}
end subroutine range_check
end module range_mod
end interface
end python module RANGE_CHECK
! This file was auto-generated with f2py (version:1.22.3).
! See:
! https://web.archive.org/web/20140822061353/http://cens.ioc.ee/projects/f2py2e
the corresponding fortran file is :
MODULE RANGE_MOD
IMPLICIT NONE
PUBLIC :: RANGE_CHECK
CONTAINS
subroutine RANGE_CHECK(a)
INTEGER :: a
if (a.gt.10) then
print*, "Fortran: a>10 is bad, stopping here NEW2."
!call f2pystop(a) ! f2pystop must raise an exception that will trigger a long jump to the corresponding wrapper function
error_raised = 1
RETURN
end if
print*,"Fortran foo: a<=10 is good, continue."
end subroutine RANGE_CHECK
subroutine ctl_stop( cd1 )
CHARACTER(len=*), INTENT(in) :: cd1
END MODULE RANGE_MOD
This will raise an error like such:
pytest -s test.py
==================================================================== test session starts =====================================================================
platform darwin -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/juliusbusecke/Code/fortran_error_test
collected 2 items
test.py Fortran: a>10 is bad, stopping here NEW2.
F Fortran foo: a<=10 is good, continue.
F
========================================================================== FAILURES ==========================================================================
______________________________________________________________________ test_range_error ______________________________________________________________________
def test_range_error():
# range_mod(12, raise_error)
> range_mod(12)
E ValueError: Oooops
test.py:15: ValueError
______________________________________________________________________ test_range_good _______________________________________________________________________
def test_range_good():
# range_mod(12, raise_error)
> range_mod(2)
E ValueError: Oooops
test.py:21: ValueError
================================================================== short test summary info ===================================================================
FAILED test.py::test_range_error - ValueError: Oooops
FAILED test.py::test_range_good - ValueError: Oooops
===================================================================== 2 failed in 0.36s ======================================================================
(fortran_error_test)
for a test script:
import pytest
from RANGE_CHECK import range_mod
errormsg = 0
def raise_error(msg):
global errormsg
print(f"raising error message {msg}")
errormsg = msg
def test_range_error():
# range_mod(12, raise_error)
range_mod(12)
assert errormsg == 12
def test_range_good():
# range_mod(12, raise_error)
range_mod(2)
assert 1 == 1
What we need to do next:
- Figure out a way to raise this error only conditionally (if the fortran code has actually failed!). This probably requires us to inspect the outputs of the fortran function in some way from c within the
callstatement
block. The easiest thing I can think of would be to just have an integer return value that we check. I think the outputs are somehow inside&a
. - Optional: figure out how to pass the fortran error messages to python. Right now they are printed to stderr. Options.
- EASY: Do nothing but note in the ValueError we raise that there may be more info in stderr
- MEDIUM: redirect stderr to something else, e.g. https://www.greghaskins.com/archive/python-forcibly-redirect-stdout-stderr-from-extension-modules
- HARD: have fortran actually send a string to python with the error message in it
- Once we figure all this out, we need to make it work with aerobulk. The easiest way would be to somehow intercept the ctl_stop subroutine and replace it with our own custom subroutine that, instead of calling STOP, sets this error flag and calls RETURN. (Does RETURN just terminate the innermost subroutine or the whole fortran program?)
Steps to reproduce results with above files:
- Build initial signature file (only do once, because we modify this):
python -m numpy.f2py range_check.f90 -m RANGE_CHECK -h range_check.pyf --overwrite-signature
- Build extensions using f2py:
python -m numpy.f2py --verbose -c --f90flags="-fdefault-real-8 -ffree-line-length-200 --std=gnu" range_check.f90 range_check.pyf
- Optional: Build c wrappers for inspection:
python -m numpy.f2py --verbose -m range_check.f90 range_check.pyf
Epic session today.
๐๐ค
We just discovered that @brian-rose is an expert on f2py! (See https://github.com/brian-rose/minimalf2py).
Brian, do you have any experience on error handling from Fortran. The above thread is very long and probably confusing, but basically we are trying to redirect Fortran STOP commands to raise Python exceptions (rather than crashing the entire session). Have you ever dealt with that situation before?
We just discovered that @brian-rose is an expert on f2py! (See https://github.com/brian-rose/minimalf2py).
... yikes. I am now about to be unmasked as a non-expert.
Unfortunately no, I've never dealt with this. To my knowledge, none of the fortran code I've ever tried to wrapped used STOP commands. I'll follow your progress here though!
If there's a good general solution (perhaps involving modifying the signature file, as you discussed above), then that might make a nice addition to minimalf2py
So I just realized that omitting all the error checking in AEROBULK_INIT
will lead to a segfault (see 2536b44). This is not helpful for the user, but it at least shows up as failed in the CI!
I think we should just bite the bullet and to the range_check in the numpy code (where we can raise useful errors). I have touched upon this in a bit more detail in #36 (comment)