Caliper doesn't work on Summit for me
yaoyi92 opened this issue · 2 comments
Hello, I have used Caliper on normal intel CPU machines, and worked perfectly.
However, when I try to use it on the Summit IBM+GPU machine, I got this error. I don't understand what's the difference in MPI between machines. I have also pasted my submission script.
===
== CALIPER: (7): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (6): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (31): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (30): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (16): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (17): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (11): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (34): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (35): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (10): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (4): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (3): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (5): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (2): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (23): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (22): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (19): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (18): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (27): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (33): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (29): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (28): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (26): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (32): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (0): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (1): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (14): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (25): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (8): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (15): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (21): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (20): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (13): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (12): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (9): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (24): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
===
submission script
===
#!/bin/bash
#BSUB -P MAT240
#BSUB -W 2:00
#BSUB -nnodes 1
#BSUB -alloc_flags gpumps
#BSUB -J aims-gw
#BSUB -o aims.%J
#BSUB -N yy244@duke.edu
#BSUB -q debug
module purge
#module load gcc/7.4.0 spectrum-mpi/10.3.1.2-20200121 cuda/10.1.243 essl/6.1.0-2 netlib-lapack/3.8.0 netlib-scalapack/2.0.2
module load gcc/7.5.0 spectrum-mpi/10.4.0.3-20210112 cuda/10.1.243 essl/6.1.0-2 netlib-lapack/3.8.0 netlib-scalapack/2.1.0
module load nsight-systems/2021.3.1.54
bin=/ccs/home/yaoyi92/fhiaims/FHIaims/build_gpu_caliper_fft/aims.211010.scalapack.mpi.x
export OMP_NUM_THREADS=1
ulimit -s unlimited
export CALI_CONFIG=runtime-report
jsrun -n 2 -a 18 -c 18 -g 3 -r 2 $bin > aims.out
===
Hi @yaoyi92 ,
Is this a Fortran code? In this case Caliper's automatic wrapping of MPI_Finalize() unfortunately doesn't work, so it can't trigger its output aggregation at the right time.
The best way to get around this is to use Caliper's ConfigManager control API, which lets you configure and start/stop profiling in the code. There's a modified TeaLeaf_CUDA implementation (https://github.com/daboehme/TeaLeaf_CUDA/tree/dev/caliper-support) as example, look specifically at https://github.com/daboehme/TeaLeaf_CUDA/blob/dev/caliper-support/tea_caliper.f90.
As another workaround you can try skipping the across-MPI aggregation and just write reports for each MPI rank. This config should produce report-0.txt
, report-1.txt
, and so on for each rank:
CALI_CONFIG="runtime-report,aggregate_across_ranks=false,profile.mpi,output=report-%mpi.rank%.txt"