JCSDA/pycrtm

Fix MPI so we don't have two OMP loops

Closed this issue · 3 comments

The basic first thing Need to remove outer OMP loops in pycrtm.f90 so we don't have competing things between CRTM under the hood and outside with pycrtm wrap_forward and wrap_k_matrix.

One trick may be memory management. Loading all the profiles at into the CRTM atmosphere structure could present issues for memory for cases with lots of profiles (part of why I just loaded and created one profile at a time when loading up a CRTM_Atmosphere_type). It might make sense to do it in user adjustable chunks (maybe default use a single chunk, or profiles > some large number, split into 10 chunks).

I have what I think will work as a basic fix, but still it's still not ideal. In https://github.com/JCSDA/pycrtm/tree/feature/bmk_restore_openmp_simple, I set the number of threads to 1 before going into crtm, letting the outer OMP loop in pycrtm handle things. I ran some of the tests repeatedly changing the number of threads etc, and I didn't see the intermittent behavior where you'd get a failure previously. Then again, it was intermittent, so that makes things tricky to test.

In https://github.com/JCSDA/pycrtm/tree/feature/bmk_restore_openmp_invasive, I tried to let the openmp under the hood in CRTM take control taking out the OpenMP stuff out of pycrtm, and just using the OMP_NUM_THREADS environment variable to control the number of threads. It runs, but when you use OMP_NUM_THREADS > 1, the tests will fail and you'll get some profiles that checkout OK, but others will be scrambled with higher brightness temperatures for some or all channels.

For the second option (invasive), I think there is some sort of memory leak or something like a memory leak with gomp or the combination of python/f2py/gomp, as I don't have the same problem using intel.

feature/bmk_restore_openmp_simple does not work. I get the error in the singularity container that we saw initially something like:

 CRTM_Geometry_IsValid(INFORMATION) : Invalid FOV index. Must be > 0.
 CRTM_Forward(FAILURE) : Input data check failed for profile #1
 CRTM_Forward(FAILURE) : 1 profiles failed
 Error CALLing CRTM_Forward.
 CRTM_Geometry_IsValid(INFORMATION) : Invalid FOV index. Must be > 0.
 CRTM_Forward(FAILURE) : Input data check failed for profile #1
 CRTM_Forward(FAILURE) : 1 profiles failed
 Error CALLing CRTM_Forward.