jameskermode/f90wrap

Handle collision in array wrapper (that breaks CI intermittently)

Opened this issue · 0 comments

I am encountering a sporadic handle collision for the python-side cache of f90wrapped fortran arrays.

This is because the pywrap keeps a cache of accessed arrays that are indexed by handle, which is the memory location of the array. When arrays are deallocated and then re-allocated, especially on memory-constrained systems, there is a small but not insignificant chance of them being placed in the exact same location. This would happen sporadically in our CI runners resulting in very hard-to-diagnose failures, that were very hard to reproduce on our user systems.

The pywrap for an array resolves to:

    @property
    def <name>(self)
        """
        Defined at \
            <loc> \
            line xxx
        
        """
        array_ndim, array_type, array_shape, array_handle = \
            _spec_f90wrapped.f90wrap_<mod_name>__array__<name>(f90wrap.runtime.empty_handle)
        if array_handle in self._arrays:
            <name> = self._arrays[array_handle]
        else:
            <name> = f90wrap.runtime.get_array(f90wrap.runtime.sizeof_fortran_t,
                                    f90wrap.runtime.empty_handle,
                                    _spec_f90wrapped.f90wrap_<mod_name>__array__<name>)
            self._arrays[array_handle] = <name>
        return <name>

The f90wrapped code for the array looks like

subroutine f90wrap_allglobal__array__allrzrz(dummy_this, nd, dtype, dshape, dloc)
    use constants
    use typedefns
    use allglobal, only: allglobal_allrzrz => <NaMe>
    use, intrinsic :: iso_c_binding, only : c_int
    implicit none
    integer, intent(in) :: dummy_this(2)
    integer(c_int), intent(out) :: nd
    integer(c_int), intent(out) :: dtype
    integer(c_int), dimension(10), intent(out) :: dshape
    integer*8, intent(out) :: dloc
    
    nd = x
    dtype = y
    if (allocated(<array_name>)) then
        dshape(1:x) = shape(<array_name>)
        dloc = loc(<array_name>)
    else
        dloc = 0
    end if
end subroutine f90wrap_<mod_name>__array__<array_name>

Could a different handle generator be used than the pointer, like a hash (though this would change if the data in the array changes)? Or could the cache be invalidated if an array is deallocated?

Though such collisions are unlikely, they are more likely to pop up in CI runners because they are: a) more memory constrained and b) they run similar tests many times, de-alllocating and allocating the same arrays over and over again.

What will hopefully solve this for us is manually clearing the modules' cache when instantiating a class that uses the f90wrapped code with my_mod._arrays = {}, but the error we found seems to hint at a deeper issue

P.S. f90wrap is being used with much effect to wrap magneto hydrodynamic equilibrium codes and optimize stellarator fusion reactors, thanks for the great work!