LLNL/Caliper

Caliper doesn't work on Summit for me

yaoyi92 opened this issue · 2 comments

Hello, I have used Caliper on normal intel CPU machines, and worked perfectly.

However, when I try to use it on the Summit IBM+GPU machine, I got this error. I don't understand what's the difference in MPI between machines. I have also pasted my submission script.

===
== CALIPER: (7): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (6): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (31): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (30): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (16): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (17): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (11): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (34): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (35): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (10): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (4): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (3): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (5): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (2): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (23): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (22): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (19): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (18): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (27): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (33): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (29): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (28): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (26): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (32): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (0): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (1): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (14): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (25): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (8): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (15): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (21): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (20): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (13): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (12): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (9): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (24): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
===

submission script

===

#!/bin/bash

#BSUB -P MAT240
#BSUB -W 2:00
#BSUB -nnodes 1
#BSUB -alloc_flags gpumps
#BSUB -J aims-gw
#BSUB -o aims.%J
#BSUB -N yy244@duke.edu
#BSUB -q debug

module purge
#module load gcc/7.4.0 spectrum-mpi/10.3.1.2-20200121  cuda/10.1.243 essl/6.1.0-2 netlib-lapack/3.8.0 netlib-scalapack/2.0.2
module load gcc/7.5.0 spectrum-mpi/10.4.0.3-20210112 cuda/10.1.243 essl/6.1.0-2 netlib-lapack/3.8.0 netlib-scalapack/2.1.0
module load nsight-systems/2021.3.1.54

bin=/ccs/home/yaoyi92/fhiaims/FHIaims/build_gpu_caliper_fft/aims.211010.scalapack.mpi.x

export OMP_NUM_THREADS=1

ulimit -s unlimited
export CALI_CONFIG=runtime-report
jsrun -n 2 -a 18 -c 18 -g 3 -r 2 $bin > aims.out
===

Hi @yaoyi92 ,

Is this a Fortran code? In this case Caliper's automatic wrapping of MPI_Finalize() unfortunately doesn't work, so it can't trigger its output aggregation at the right time.

The best way to get around this is to use Caliper's ConfigManager control API, which lets you configure and start/stop profiling in the code. There's a modified TeaLeaf_CUDA implementation (https://github.com/daboehme/TeaLeaf_CUDA/tree/dev/caliper-support) as example, look specifically at https://github.com/daboehme/TeaLeaf_CUDA/blob/dev/caliper-support/tea_caliper.f90.

As another workaround you can try skipping the across-MPI aggregation and just write reports for each MPI rank. This config should produce report-0.txt, report-1.txt, and so on for each rank:

CALI_CONFIG="runtime-report,aggregate_across_ranks=false,profile.mpi,output=report-%mpi.rank%.txt"

Thank you @daboehme, it works for me! Yes, I am working on a Fortran code.