madgraph5/madgraph4gpu

Time profiles for DY+4j (and DY+3j) have high 'python/bash' component - especially for cuda

Opened this issue · 0 comments

Documenting/Analysing further results of DY+4jet tests in #948

This is an issue that I had mentioned for DY+3j in #994 for cuda. In DY+4j it is even larger, and it becomes obvious (to a lesser extent) also for cpp and fortran. So I strip this off to a separate issue.

pp_dy4j.mad/fortran/output.txt (#events: 81)
[GridPackCmd.launch] OVERALL TOTAL    21707.6095 seconds
[madevent COUNTERS]  PROGRAM TOTAL    21546.1
[madevent COUNTERS]  Fortran Overhead 1579.09
[madevent COUNTERS]  Fortran MEs      19967
--------------------------------------------------------------------------------
pp_dy4j.mad/cppnone/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    26745.1639 seconds
[madevent COUNTERS]  PROGRAM TOTAL    26584.9
[madevent COUNTERS]  Fortran Overhead 1608.51
[madevent COUNTERS]  CudaCpp MEs      24910.4
[madevent COUNTERS]  CudaCpp HEL      66.0341
--------------------------------------------------------------------------------
pp_dy4j.mad/cppsse4/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    14398.4664 seconds
[madevent COUNTERS]  PROGRAM TOTAL    14231.3
[madevent COUNTERS]  Fortran Overhead 1647.03
[madevent COUNTERS]  CudaCpp MEs      12550.6
[madevent COUNTERS]  CudaCpp HEL      33.7035
--------------------------------------------------------------------------------
pp_dy4j.mad/cppavx2/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    7335.2356 seconds
[madevent COUNTERS]  PROGRAM TOTAL    7114.43
[madevent COUNTERS]  Fortran Overhead 1683.7
[madevent COUNTERS]  CudaCpp MEs      5415.48
[madevent COUNTERS]  CudaCpp HEL      15.2596
--------------------------------------------------------------------------------
pp_dy4j.mad/cpp512y/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    6831.8971 seconds
[madevent COUNTERS]  PROGRAM TOTAL    6649.98
[madevent COUNTERS]  Fortran Overhead 1669.94
[madevent COUNTERS]  CudaCpp MEs      4966.24
[madevent COUNTERS]  CudaCpp HEL      13.8066
--------------------------------------------------------------------------------
pp_dy4j.mad/cpp512z/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    7136.2962 seconds
[madevent COUNTERS]  PROGRAM TOTAL    6958.96
[madevent COUNTERS]  Fortran Overhead 1636.28
[madevent COUNTERS]  CudaCpp MEs      5305.14
[madevent COUNTERS]  CudaCpp HEL      17.5447
--------------------------------------------------------------------------------
pp_dy4j.mad/cuda/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    2523.7488 seconds
[madevent COUNTERS]  PROGRAM TOTAL    2234.93
[madevent COUNTERS]  Fortran Overhead 1820.36
[madevent COUNTERS]  CudaCpp MEs      97.9622
[madevent COUNTERS]  CudaCpp HEL      316.613
--------------------------------------------------------------------------------

Specifically the python/bash ("GridPackCmd OVERALL TOTAL" - "madevent PROGRAM TOTAL") is

  • 300s (2500-2200) for cuda
  • 180s (6830-6650) for cpp 512y
  • 160s (21710-21550) for fortran

So again, this non-ME component becomes more disturbing/visible for the faster MEs like cuda and simd. BUT in addition, it seems even higher for cuda.

To be understood...