Time profiles for DY+4j (and DY+3j) have high 'python/bash' component - especially for cuda
Opened this issue · 0 comments
valassi commented
Documenting/Analysing further results of DY+4jet tests in #948
This is an issue that I had mentioned for DY+3j in #994 for cuda. In DY+4j it is even larger, and it becomes obvious (to a lesser extent) also for cpp and fortran. So I strip this off to a separate issue.
pp_dy4j.mad/fortran/output.txt (#events: 81)
[GridPackCmd.launch] OVERALL TOTAL 21707.6095 seconds
[madevent COUNTERS] PROGRAM TOTAL 21546.1
[madevent COUNTERS] Fortran Overhead 1579.09
[madevent COUNTERS] Fortran MEs 19967
--------------------------------------------------------------------------------
pp_dy4j.mad/cppnone/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL 26745.1639 seconds
[madevent COUNTERS] PROGRAM TOTAL 26584.9
[madevent COUNTERS] Fortran Overhead 1608.51
[madevent COUNTERS] CudaCpp MEs 24910.4
[madevent COUNTERS] CudaCpp HEL 66.0341
--------------------------------------------------------------------------------
pp_dy4j.mad/cppsse4/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL 14398.4664 seconds
[madevent COUNTERS] PROGRAM TOTAL 14231.3
[madevent COUNTERS] Fortran Overhead 1647.03
[madevent COUNTERS] CudaCpp MEs 12550.6
[madevent COUNTERS] CudaCpp HEL 33.7035
--------------------------------------------------------------------------------
pp_dy4j.mad/cppavx2/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL 7335.2356 seconds
[madevent COUNTERS] PROGRAM TOTAL 7114.43
[madevent COUNTERS] Fortran Overhead 1683.7
[madevent COUNTERS] CudaCpp MEs 5415.48
[madevent COUNTERS] CudaCpp HEL 15.2596
--------------------------------------------------------------------------------
pp_dy4j.mad/cpp512y/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL 6831.8971 seconds
[madevent COUNTERS] PROGRAM TOTAL 6649.98
[madevent COUNTERS] Fortran Overhead 1669.94
[madevent COUNTERS] CudaCpp MEs 4966.24
[madevent COUNTERS] CudaCpp HEL 13.8066
--------------------------------------------------------------------------------
pp_dy4j.mad/cpp512z/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL 7136.2962 seconds
[madevent COUNTERS] PROGRAM TOTAL 6958.96
[madevent COUNTERS] Fortran Overhead 1636.28
[madevent COUNTERS] CudaCpp MEs 5305.14
[madevent COUNTERS] CudaCpp HEL 17.5447
--------------------------------------------------------------------------------
pp_dy4j.mad/cuda/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL 2523.7488 seconds
[madevent COUNTERS] PROGRAM TOTAL 2234.93
[madevent COUNTERS] Fortran Overhead 1820.36
[madevent COUNTERS] CudaCpp MEs 97.9622
[madevent COUNTERS] CudaCpp HEL 316.613
--------------------------------------------------------------------------------
Specifically the python/bash ("GridPackCmd OVERALL TOTAL" - "madevent PROGRAM TOTAL") is
- 300s (2500-2200) for cuda
- 180s (6830-6650) for cpp 512y
- 160s (21710-21550) for fortran
So again, this non-ME component becomes more disturbing/visible for the faster MEs like cuda and simd. BUT in addition, it seems even higher for cuda.
To be understood...