Darshan measurement fails on MPI_Comm_free
Closed this issue · 2 comments
Hello,
I am using Darshan/3.4.4 (runtime) to instrument an MPI application based on intel-oneapi-mpi/2021.10.0
. The measurement fails with the following error
bort(873021445) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Comm_free: Invalid communicator, error stack:
PMPI_Comm_free(135): MPI_Comm_free(comm=0xe0dbc60) failed
PMPI_Comm_free(83).: Invalid communicator
I have installed darshan with spack as darshan-runtime+hdf5+parallel-netcdf
; the code profiled has no issues without instrumentation and the darshan install works with simpler codes. I am wondering whether MPI_Comm_free
API is supported or not?
Thank you,
Laura
Are you sure both the application and darshan-runtime are built against the same MPI (intel-oneapi-mpi/2021.10.0)? I don't think I've tested this particular MPI yet, but generally speaking we like to ensure that Darshan and the application are using the same MPI implementation.
What system is this on? Is it possible for you to share some code that demonstrates the problem so that I could try to reproduce and debug myself?
Darshan shouldn't at all be affected by application usage of MPI communicators, so think something else weird is going on.
Hello, you were right. I noticed that the hash of the intel-oneapi-mpi installation for the application instrumented was not the same hash of the intel-oneapi-mpi installation used for darsahan. Thank you.