LLNL/Caliper

Incompatibility between Caliper and HDF5 (with MPIO) on Lassen

wrtobin opened this issue · 1 comments

I'm working on some code on Lassen (using clang upstream and spectrum MPI) and when I do:

  cali::Function __cali_ann##__func__(timingHelpers::stripPF(__PRETTY_FUNCTION__).c_str());
  hid_t fapl_id = H5Pcreate( H5P_FILE_ACCESS );
  H5Pset_fapl_mpio( fapl_id, MPI_COMM_WORLD, MPI_INFO_NULL );
  hid_t file_id = H5Fcreate( "empty", H5F_ACC_TRUNC, H5P_DEFAULT, fapl_id );
  H5Pclose( fapl_id );
  H5Fclose( file_id );

( Where timingHelpers::stripPF(__PRETTY_FUNCTION__) produces nicely-formatted function strings. )

I get an exception on closing the file inside of PMPI_Write_File_at (under wrap_MPI_File_write_at).

If I remove the caliper annotation it works fine, if I don't H5set_fapl_mpio it also works fine -- but that is required to use HDF5 to access the same file from multiple processes.

I've additionally tried submitting CALIPER_MPI_BLACKLIST=MPI_File_write_at (as well as a list including all MPI_File_XXX calls) but that didn't effect things.

According to the CALIPER documentation the MPI I/O routines aren't wrapped, so I'm surprised I'm running into this.

Any help would be awesome, thanks.

Hi @wrtobin ,

Thanks for reporting this. I can reproduce the bug, and I think I have an idea on what might cause it. I'll keep you posted.

The Caliper MPI wrappers do indeed wrap every MPI function for basic timing, we just don't collect more detailed data for, e.g., message tracing for MPI I/O. Meanwhile, you can try setting CALI_MPI_WHITELIST for a list of MPI functions you explicitly do want to capture - just stay clear from anything I/O related.