“Segmentation fault - invalid memory reference” happened when use openmp

Question

“Segmentation fault - invalid memory reference” happened when use openmp

mxdzsy1228 opened this issue 6 years ago · 5 comments

I can pass the magic_wizard.py with MPI. However, if I try to run the hybrid code, “Segmentation fault" error would always happen.
Here I try to compile and run the dynamo_benchmark sample:

$ export FC=mpifort
$ export CC=mpicc
$ cmake .. -DUSE_FFTLIB=JW -DUSE_LAPACKLIB=JW
-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- Check for working C compiler: /usr/bin/mpicc
-- Check for working C compiler: /usr/bin/mpicc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The Fortran compiler identification is GNU 7.3.0
-- Check for working Fortran compiler: /usr/bin/mpifort
-- Check for working Fortran compiler: /usr/bin/mpifort -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /usr/bin/mpifort supports Fortran 90
-- Checking whether /usr/bin/mpifort supports Fortran 90 -- yes
-- Could not find hardware support for AVX2 on this machine.
-- Set architecture to '64'
-- Set precision to 'dble'
-- Set output precision to 'sngl'
-- Use MPI
-- Try OpenMP Fortran flag = [-qopenmp]
-- Try OpenMP Fortran flag = [-fopenmp]
-- Found OpenMP_Fortran: -fopenmp
-- Use OpenMP
-- Use 'JW' for the FFTs
-- Use 'JW' for the LU factorisations
-- FFTW3: '/usr/lib/x86_64-linux-gnu/libfftw3.so'
-- FFTW3_OMP: '/usr/lib/x86_64-linux-gnu/libfftw3_omp.so'
-- Use SHTNS: no
-- Compilation flags: -fopenmp -m64 -std=f2008 -g -fbacktrace -fconvert=big-endian -cpp
-- Optimisation flags: -O3 -march=native
-- Configuring done
-- Generating done
-- Build files have been written to: /home/wzzhan/magic/openmp
$ make -j
Scanning dependencies of target magic.exe
……
[100%] Built target magic.exe
$ export OMP_NUM_THREADS=2
$ export KMP_AFFINITY=verbose,granularity=core,compact,1
$ mpiexec -n 2 ./magic.exe input.nml
!--- Program MagIC 5.6 ---!
! Start time: 2019/04/01 17:01:55
! Reading grid parameters!
! Reading control parameters!
! Reading physical parameters!
! Reading start information!
! Reading output information!
! Reading inner core information!
! Reading mantle information!
! Reading B external parameters!
! No B_external namelist found!
0: lmStartB= 1, lmStopB= 77
1: lmStartB= 78, lmStopB= 153
Using rIteration type: rIterThetaBlocking_OpenMP_t
! Uneven load balancing in LM blocks!
! Load percentage of last block: 98.701298701298697
0: lmStartB= 1, lmStopB= 77
1: lmStartB= 78, lmStopB= 153
Using snake ordering.
1 1 77 77
2 78 153 76
rank no 0 has l1m0 in block 1
!-- Blocking information:
! Number of LM-blocks: 2
! Size of LM-blocks: 77
! nThreads: 2
! Number of theta blocks: 2
! size of theta blocks: 12
! ideal size (nfs): 12
Using rIteration type: rIterThetaBlocking_OpenMP_t
! Const. entropy at outer boundary S = -1.091314E-01
! Const. entropy at inner boundary S = 8.908686E-01
! Total vol. buoy. source = 0.000000E+00
-----> rank 0 has 7384296 B allocated
……
! Using dtMax time step: 1.000000E-04
! NO STARTFILE READ, SETTING Z10!
! Entropy initialized at mode: l= 4 m= 4 Ampl= 0.10000
! Only l=m=0 comp. in tops:
! Self consistent dynamo integration.
! Normalized OC moment of inertia: 1.436464E+01
! Normalized IC moment of inertia: 7.584414E-02
! Normalized MA moment of inertia: 2.848460E+02
! Normalized IC volume : 6.539622E-01
! Normalized OC volume : 1.459880E+01
! Normalized IC surface : 3.643504E+00
! Normalized OC surface : 2.974289E+01
! Grid parameters:
n_r_max = 33 = number of radial grid points
n_cheb_max = 31
max cheb deg.= 30
n_phi_max = 48 = no of longitude grid points
n_theta_max = 24 = no of latitude grid points
n_r_ic_max = 17 = number of radial grid points in IC
n_cheb_ic_max= 14
max cheb deg = 28
l_max = 16 = max degree of Plm
m_max = 16 = max oder of Plm
lm_max = 153 = no of l/m combinations
minc = 1 = longitude symmetry wave no
nalias = 20 = spher. harm. deal. factor
! STARTING TIME INTEGRATION AT:
start_time = 0.0000000000E+00
step no = 0
start dt = 1.0000E-04
start dtNew= 1.0000E-04
! Starting time integration!
! BUILDING MATRICIES AT STEP/TIME: 1 1.000000E-04
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7ff752ffb2da in ???
#1 0x7ff752ffa503 in ???
#2 0x7ff75242ef1f in ???
#3 0x7ff753bac1f2 in __updates_mod_MOD_updates._omp_fn.8
at /home/wzzhan/magic/src/updateS.f90:268
#4 0x7ff752a20888 in ???
#5 0x7ff752a2910f in ???
#6 0x7ff753bab5de in __updates_mod_MOD_updates._omp_fn.6
at /home/wzzhan/magic/src/updateS.f90:189
#7 0x7ff752a1dece in ???
#8 0x7ff753bae4de in __updates_mod_MOD_updates
at /home/wzzhan/magic/src/updateS.f90:189
#9 0x7ff753a1336e in __lmloop_mod_MOD_lmloop
at /home/wzzhan/magic/src/LMLoop.f90:207
#10 0x7ff753b91698 in __step_time_mod_MOD_step_time
at /home/wzzhan/magic/src/step_time.f90:1250
#0 0x7f8bbabfb2da in ???
#1 0x7f8bbabfa503 in ???
#2 0x7f8bba02ef1f in ???
#11 0x7ff753a11607 in magic
at /home/wzzhan/magic/src/magic.f90:367
#12 0x7ff753a10f08 in main
at /home/wzzhan/magic/src/magic.f90:89
#3 0x7f8bbb7e8205 in __algebra_MOD_cgeslml
at /home/wzzhan/magic/src/algebra.f90:178
==================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 4652 RUNNING AT DESKTOP-ER7J2RN
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
==================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

I use MPICH to compile this code. However, even though I change it to open-mpi 4.0.1 or open-mpi 3.0.1 which running in another workstation, it still can't work. I don't know openmp much so I'm unable to figure out where the problem is.

Answer 1 · 2019-04-03T18:47:42.000Z

Maybe try to disable openMP using -DUSE_OMP=no at the cmake stage.

Answer 2 · 2019-04-03T18:49:54.000Z

Ok, sorry I missed that you already tried that. So, MagIC won't work in its hybrid form using openmpi because of the poor support for thread-safety. Some mpich versions should work but frankly on a workstation you should disable openMP, this is only worth it on clusters with intelmpi.

Answer 3 · 2019-04-08T12:42:05.000Z

Ok, sorry I missed that you already tried that. So, MagIC won't work in its hybrid form using openmpi because of the poor support for thread-safety. Some mpich versions should work but frankly on a workstation you should disable openMP, this is only worth it on clusters with intelmpi.

Thank you for your answer. But it still seems to be hard to work in its hybrid on the HPC which I’m using. I would like to rebuild the intel MPI environment later.

I saw the line written in the pdf doc:” the n_r_max-1 must be a multiple of <n_mpi>, where n_r_max is the number of radial grid points.” So at first I thought that the number of CPU cores should be smaller than n_r_max. But the Table B.4 in

Performance benchmarks for a next generation numerical dynamo model[J]. Geochemistry, Geophysics, Geosystems, 2016, 17(5):1586-1607.

shows a case that MagIC5 run a (192, 384, 768) sample with 512 CPU cores. So I wonder whether the using of Openmp can make samples working by many CPU cores and a little grids. Is it right? or did I misread something?

Answer 4 · 2019-04-10T08:25:55.000Z

No you're correct: number of MPI ranks is limited to n_r_max-1 but you can indeed multiply this by a number of openMP threads. That being said, only intelmpi and mpich have a proper support for MPI_THREAD_MULTIPLE used in MagIC.

Answer 5 · 2019-07-27T15:08:06.000Z

With the latest version of the code, we're back to MPI_THREAD_FUNNELED and so openmpi should
work fine too.