LLNL/Caliper

How to profiling the timing of function calls inside openmp loop

yaoyi92 opened this issue · 0 comments

Dear Caliper developers,

I am working on profiling a MPI+openmp code. For the openmp, we have function calls inside the openmp threads. What's the correct way to get the "runtime-report" for the function calls inside the threads?

Here are my attempts.

If I run the default command CALI_CONFIG=runtime-report srun $SISSOPP,
I got

Path                             Min time/rank Max time/rank Avg time/rank Time %    
main                                  0.306833      0.316301      0.313905 23.891259 
  sis                                 0.014496      0.019343      0.018768  1.428431 
  make_shared_featurespace            0.000201      0.002033      0.000828  0.063019 
    generate_feature_space            0.004758      0.010833      0.008719  0.663625 
      generate_feats                  0.000660      0.006001      0.002509  0.190990 
        generate_non_param_feats      0.002194      0.003940      0.003267  0.248628 
generate_non_param_feats              0.002997      0.006621      0.004622  0.351750 

Here, the generate_non_param_feats is inside openmp loop. In the code, it is always under generate_feats . However, I saw two generate_non_param_feats blocks in the output. One in the correct position, while, the other is outside and places out of the main program.

Following the web page https://software.llnl.gov/Caliper/CaliperBasics.html#notes-on-multi-threading, I also tried CALI_CALIPER_ATTRIBUTE_DEFAULT_SCOPE=process CALI_CONFIG=runtime-report srun $SISSOPP. However, this time, it gave me the result here.

Path                                   Min time/rank Max time/rank Avg time/rank Time %    
main                                        0.302818      0.339568      0.327741 24.049391 
  sis                                       0.014559      0.018591      0.018061  1.192767 
  make_shared_featurespace                  0.000074      0.002291      0.000732  0.053706 
    generate_feature_space                  0.006720      0.011636      0.008599  0.630996 
      generate_feats                        0.000473      0.329705      0.100472  7.372592 
        generate_non_param_feats            0.002945      0.670026      0.270965 19.883256 
          generate_non_param_feats          0.001565      0.663913      0.366057 26.860994 
            generate_non_param_feats        0.000926      0.345460      0.269563 19.780335 
              generate_non_param_feats      0.000713      0.003053      0.001774  0.117179 

It seems caliper is using the same tag for calls from different thread.

Is there any best practices to get the timing of different regions inside openmp threads?

Best wishes,
Yi