HPDCS/ROOT-Sim

openmpi: Intel Omnipath (psm2) library causes segfault

Closed this issue · 6 comments

Problem

Even if the support for multithread MPI invocation has become a default feature since openmpi version 3.0.0 some of the communication sub-libraries still don't fully support this feature.

In particular, ROOT-Sim triggers a bug of the Intel Omnipath (psm2) communication library.

Reproducibility

To reproduce the bug ROOT-Sim needs to be configured with MPI support (--enable-mpi) and the openmpi implementation in use needs to make use of both the psm2 module and the multithreading capabilities (dafault from version 3.0.0 )

Work around

The Intel Omnipath module can be disabled:

  • At runtime by passing the following parameters to mpirun command:
mpirtun --mca mtl ^psm2 [EXECUTABLE] [ARGS]
  • At runtime by setting the following environment variable
OMPI_MCA_mtl=^psm2
  • During configuration of openmpi before to compile it. Using the following configuration flag:
--without-psm2

Reproducibility Context

ROOT-Sim configuration:

CPU Architecture.......... : x86_64 (available supports: rdtcs SSE3 SSSE3 SSE4.2 )
Operating System.......... : linux-gnu
Debugging Support......... : Enabled
MPI....................... : Enabled (compiler: mpicc)
User-Level Threads........ : Disabled
Parallel Allocator........ : Enabled (By default, use --disable-allocator if not wanted)
NUMA Subsystem............ : Disabled (manually excluded)
LP Preemption Support..... : Disabled (User-Level Threads are necessary)
LP Rebinding.............. : Disabled (manually excluded)
Linux Kernel Modules...... : Disabled (enable with --enable-modules)
Event Cross State......... : Disabled (Requires Linux Kernel Modules)

OMPI_INFO

                 Package: Open MPI Distribution
                Open MPI: 3.0.0
  Open MPI repo revision: v3.0.0
   Open MPI release date: Sep 12, 2017
                Open RTE: 3.0.0
  Open RTE repo revision: v3.0.0
   Open RTE release date: Sep 12, 2017
                    OPAL: 3.0.0
      OPAL repo revision: v3.0.0
       OPAL release date: Sep 12, 2017
                 MPI API: 3.1.0
            Ident string: 3.0.0
                  Prefix: ~/build/mpi/builds/ompi-3.0.0-debug
 Configured architecture: x86_64-unknown-linux-gnu
          Configure host: login1
           Configured by: **
           Configured on: Thu Jan 25 14:06:18 CET 2018
          Configure host: login1
  Configure command line: '--prefix=~/build/mpi/builds/ompi-3.0.0-debug'
                          '--enable-debug' '--enable-mem-debug'
                          '--enable-mem-profile' '--enable-memchecker'
                          '--enable-picky' '--enable-pretty-print-stacktrace'
                          '--enable-mpi-interface-warning'
                          '--disable-mpi-fortran' '--disable-mpi-cxx'
                          '--disable-mpi-java' '--with-psm2'
                          '--with-libfabric-libdir=/apps/LIBFABRIC/1.4.2/lib'
                          '--with-slurm' '--with-pmi'
                          '--with-oshmem-param-check'
                          '--with-mpi-param-check'
                Built by: **
                Built on: Thu Jan 25 14:31:37 CET 2018
              Built host: login1
              C bindings: yes
            C++ bindings: no
             Fort mpif.h: no
            Fort use mpi: no
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: no
 Fort mpi_f08 compliance: The mpi_f08 module was not built
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: gcc
     C compiler absolute: /apps/GCC/7.2.0/bin/gcc
  C compiler family name: GNU
      C compiler version: 7.2.0
            C++ compiler: g++
   C++ compiler absolute: /apps/GCC/7.2.0/bin/g++
           Fort compiler: gfortran
       Fort compiler abs: /apps/GCC/7.2.0/bin/gfortran
         Fort ignore TKR: no
   Fort 08 assumed shape: no
      Fort optional args: no
          Fort INTERFACE: no
    Fort ISO_FORTRAN_ENV: no
       Fort STORAGE_SIZE: no
      Fort BIND(C) (all): no
      Fort ISO_C_BINDING: no
 Fort SUBROUTINE BIND(C): no
       Fort TYPE,BIND(C): no
 Fort T,BIND(C,name="a"): no
            Fort PRIVATE: no
          Fort PROTECTED: no
           Fort ABSTRACT: no
       Fort ASYNCHRONOUS: no
          Fort PROCEDURE: no
         Fort USE...ONLY: no
           Fort C_FUNLOC: no
 Fort f08 using wrappers: no
         Fort MPI_SIZEOF: no
             C profiling: yes
           C++ profiling: no
   Fort mpif.h profiling: no
  Fort use mpi profiling: no
   Fort use mpi_f08 prof: no
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, ORTE progress: yes, Event lib:
                          yes)
           Sparse Groups: no
  Internal debug support: yes
  MPI interface warnings: yes
     MPI parameter check: always
Memory profiling support: yes
Memory debugging support: yes
              dl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
         MPI I/O support: yes
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
          MPI extensions: affinity, cuda
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v3.0.0)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v3.0.0)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v3.0.0)
                 MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v3.0.0)
                 MCA btl: self (MCA v2.1.0, API v3.0.0, Component v3.0.0)
            MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v3.0.0)
            MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA crs: none (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v3.0.0)
               MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
               MCA hwloc: hwloc1117 (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v3.0.0)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v3.0.0)
          MCA memchecker: valgrind (MCA v2.1.0, API v2.0.0, Component v3.0.0)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v3.0.0)
               MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v3.0.0)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v3.0.0)
                MCA pmix: s1 (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA pmix: s2 (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA pmix: pmix2x (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v3.0.0)
               MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v3.0.0)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v3.0.0)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v3.0.0)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v3.0.0)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v3.0.0)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v3.0.0)
                 MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v3.0.0)
                 MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v3.0.0)
              MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
                          v3.0.0)
              MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
                          v3.0.0)
              MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
                          v3.0.0)
              MCA errmgr: dvm (MCA v2.1.0, API v3.0.0, Component v3.0.0)
              MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
                          v3.0.0)
                 MCA ess: env (MCA v2.1.0, API v3.0.0, Component v3.0.0)
                 MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
                          v3.0.0)
                 MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v3.0.0)
                 MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v3.0.0)
                 MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v3.0.0)
                 MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v3.0.0)
               MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v3.0.0)
             MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v3.0.0)
                 MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v3.0.0)
            MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component v3.0.0)
                MCA odls: default (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
               MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v3.0.0)
               MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
               MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
               MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v3.0.0)
               MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
               MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v3.0.0)
              MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v3.0.0)
              MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v3.0.0)
              MCA routed: debruijn (MCA v2.1.0, API v3.0.0, Component v3.0.0)
              MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v3.0.0)
                 MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v3.0.0)
              MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v3.0.0)
              MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v3.0.0)
              MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v3.0.0)
              MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v3.0.0)
               MCA state: tool (MCA v2.1.0, API v1.0.0, Component v3.0.0)
               MCA state: dvm (MCA v2.1.0, API v1.0.0, Component v3.0.0)
               MCA state: orted (MCA v2.1.0, API v1.0.0, Component v3.0.0)
               MCA state: novm (MCA v2.1.0, API v1.0.0, Component v3.0.0)
               MCA state: app (MCA v2.1.0, API v1.0.0, Component v3.0.0)
               MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v3.0.0)
                 MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA coll: self (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v3.0.0)
               MCA fcoll: static (MCA v2.1.0, API v2.0.0, Component v3.0.0)
               MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v3.0.0)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                  MCA io: romio314 (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA mtl: psm2 (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v3.0.0)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v3.0.0)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v3.0.0)
                 MCA pml: v (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
                 MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v3.0.0)
                 MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v3.0.0)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v3.0.0)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v3.0.0)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v3.0.0)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v3.0.0)

Stack trace:

mpirun --np 2 ./model --np 2 --nprc 4
****************************
*  ROOT-Sim Configuration  *
****************************
Kernels: 2
Cores: 48 available, 2 used
Number of Logical Processes: 4
Output Statistics Directory: outputs
Scheduler: 0
MPI multithread support: yes
GVT Time Period: 1.00 seconds
Checkpointing Type: 2
Checkpointing Period: 10
Snapshot Reconstruction Type: 2001
Halt Simulation After: 0
LPs Distribution Mode across Kernels: 0
Check Termination Mode: 0
Blocking GVT: 0
Set Seed: 0
Initializing LPs... done
Running a traditional loop-based PHOLD benchmark with counter set to 1000, 10000 total events per LP
****************************
*    Simulation Started    *
****************************
model:~/build/mpi/openmpi-3.0.0/ompi/mca/pml/cm/pml_cm_sendreq.c:57: mca_pml_cm_send_request_completion: Assertion `0 == ((mca_pml_cm_thin_send_request_t*) base_request)->req_send.req_base.req_pml_complete' failed.
[login1:445014] *** Process received signal ***
[login1:445014] Signal: Aborted (6)
[login1:445014] Signal code:  (-6)
[login1:445014] [ 0] /lib64/libpthread.so.0(+0x10b20)[0x2b66393fcb20]
[login1:445014] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b663963d8c7]
[login1:445014] [ 2] /lib64/libc.so.6(abort+0x13a)[0x2b663963ec9a]
[login1:445014] [ 3] /lib64/libc.so.6(+0x2d856)[0x2b6639636856]
[login1:445014] [ 4] /lib64/libc.so.6(+0x2d902)[0x2b6639636902]
[login1:445014] [ 5] ~/build/mpi/builds/ompi-3.0.0-debug/lib/openmpi/mca_pml_cm.so(mca_pml_cm_send_request_completion+0x57)[0x2b664948105b]
[login1:445014] [ 6] ~/build/mpi/builds/ompi-3.0.0-debug/lib/openmpi/mca_mtl_psm2.so(ompi_mtl_psm2_progress+0x1c6)[0x2b664968b8b9]
[login1:445014] [ 7] ~/build/mpi/builds/ompi-3.0.0-debug/lib/libopen-pal.so.40(opal_progress+0xa9)[0x2b6639cfc69f]
[login1:445014] [ 8] ~/build/mpi/builds/ompi-3.0.0-debug/lib/libopen-pal.so.40(sync_wait_mt+0x18e)[0x2b6639d04d16]
[login1:445014] [ 9] ~/build/mpi/builds/ompi-3.0.0-debug/lib/openmpi/mca_pml_cm.so(+0x2e24)[0x2b664947ae24]
[login1:445014] [10] ~/build/mpi/builds/ompi-3.0.0-debug/lib/openmpi/mca_pml_cm.so(+0x3dcf)[0x2b664947bdcf]
[login1:445014] [11] ~/build/mpi/builds/ompi-3.0.0-debug/lib/libmpi.so.40(MPI_Recv+0x2ce)[0x2b66390e734d]
[login1:445014] [12] ./model(receive_remote_msgs+0x106)[0x419101]
[login1:445014] [13] ./model[0x40a250]
[login1:445014] [14] ./model[0x40be74]
[login1:445014] [15] /lib64/libpthread.so.0(+0x8724)[0x2b66393f4724]
[login1:445014] [16] /lib64/libc.so.6(clone+0x6d)[0x2b66396f2c1d]
[login1:445014] *** End of error message ***

@ael-code I'm assigning this to you, so that you can track this once they fix this upstream.

I've put the spaces to help stupid search engine indexing the issue. Maybe they are not needed in 2018 :)

I guess not 😃

Hi RooT-Sim guys,

This issue was fixed in #4346 and ported to the release branches: #4394 and #4395. However, the fix did not make it into v3.0.0. Please try 3.0.1rc2.

@matcabral
I can confirm the bug doesn't raise on 3.0.1rc2 with psm2 enabled. Thanks.

Thanks @matcabral for pointing us to the right version. I'm closing this now.