Problem importing bmtk.analyzer.compartment
moravveji opened this issue · 2 comments
I have pip
installed BMTK version 1.0.8 on our HPC cluster, running on Rocky8 OS and with Intel Icelake CPUs.
When I start an interactive job with 16 tasks, I fail to import the bmtk.analyzer.compartment
package:
$ nproc
16
$ module use /apps/leuven/rocky8/icelake/2022b/modules/all
$ module load BMTK/1.0.8-foss-2022b
$ python
Python 3.10.8 (main, Jul 13 2023, 22:10:28) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bmtk
>>> import bmtk.analyzer.compartment
[m28c27n1:3237025] OPAL ERROR: Unreachable in file ext3x_client.c at line 112
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:
version 16.05 or later: you can use SLURM's PMIx support. This
requires that you configure and build SLURM --with-pmix.
Versions earlier than 16.05: you must use either SLURM's PMI-1 or
PMI-2 support. SLURM builds PMI-1 by default, or you can manually
install PMI-2. You must then build Open MPI using --with-pmi pointing
to the SLURM PMI library location.
Please configure as appropriate and try again.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[m28c27n1:3237025] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
I have built BMTK/1.0.8-foss-2022b
(and all its dependencies) against OpenMPI/4.1.4-GCC-12.2.0
module. However, this specific OpenMPI module is not built with Slurm support. That's why parallel applications which are launched using srun
would spit out the OPAL error message above.
I would like to ask if there exists an environment variable to choose how the tasks would be launched? So that I can choose to use mpirun
directly instead of srun
.
Hi @moravveji, BMTK it-self does not directly call srun or mpirun. It uses standard mpi4py library which relies on your locally installed version of OpenMPI. We've ran large bmtk simulation using both Moab/Torque and Slurm, although how to actually execute them will be different for each cluster.
One thing to try is to create a python script and run directly from the prompt using mpirun (or mpiexec), so
$ mpirun -np 16 python my_bmtk_script.py
Unfortunately, whatever you do will no longer be interactive, and I don't think you can start-up a shell using mpirun (or alteast I've never seen it done before). If you're using Moab I think you can use the qsub -I
option to get an interactive shell, but I haven't tried it myself.
Another option to try is using/compiling a different version of OpenMPI. If you access to anaconda, it might be worth creating a test environment and installing OpenMPI/MPICH2. I believe that when it installs it will try to find the appropriate workload manager options on the system, and if there is a slurm manager on your hpc, will install with PMI support. Although in my experience it doesn't always work, especially if slurm is installed in a non-standard way.
Thanks @kaeldai for your comments.
I can already share few thoughts based on our recent try-and-error tests:
- When installing bmtk via conda, the MPICH2 implementation of MPI is by default downloaded, which does not actually pick up the local scheduler (Slurm)
- However, the
mpi4py
from the Intel channel does correctly pick up Slurm. However, the dependency requirements for other tools distributed in thebmtk
environment could not be fully satisfied, because all the necessary tools were not consistently available from the Intel's (ana)conda channel; hence, that was no-GO for us - Instead, I tried to import the
bmtk.analyzer.compartment
package via a batch job (i.e. usingsbatch
). This time, the OpenMPI runtime properly spawns processes, and the error above does not appear anymore. The reason for this behavior is that our build of Slurm does support PMI-2, however, our OpenMPI was not configured to make use of PMI support. As a result of that, interactive jobs/tasks launched viasrun
fail with the error message above
So, the take home message is to avoid using bmtk
in an interactive session (when OpenMPI is not compiled with PMI{2,x}
support).