This repository contains a collection of tools that are aimed at analysis of distributed-memory parallel applications written with MPI. The profiling interface provided by MPI makes it possible to collect detailed information about messaging, and it also provides a convenient place to enable other performance tools, including program sampling via interrupts, or collection of aggregate values for hardware counter events. The methods to build and use these tools are described in separate directories. A brief overview is sketched here. Please consult the README files in each directory for more information. ================================================================================ Tool : libmpitrace.so ; directory = src purpose : Collect and report information on MPI calls, task placement, memory utilization, user and system time. outputs : text files : mpi_profile.jobid.rank optional binary program-sampling outputs : vmon.out.jobid.rank requires : mpicc with the underlying C compiler set to gcc. optional : Enable program sampling via the profil() routine. Program-sampling requires GNU binutils development files. Note : a different program sampling method using hardware counters is preferred ... see the section on libhpmprof.so. build : cd src ./configure (builds only the MPI wrappers) or ./configure --with-vprof --with-binutils=/path/to/binutils make libmpitrace.so typical use : export LD_PRELOAD=/path/to/libmpitrace.so mpirun --np 2048 your.exe unset LD_PRELOAD ================================================================================ Tool : libmpihpm.so ; directory = src purpose : Provides the same MPI information as libmpitrace.so, plus enables collection and reporting of aggregate values for hardware counters. By default counts are reported from MPI_Init() to MPI_Finalize(), but one can instrument the code with calls to HPM_Start("label"); HPM_Stop("label"); to collect counts for specific code sections. outputs : text files : mpi_profile.jobid.rank ... MPI data hpm_job_summary.jobid.group ... counter data requires : mpicc with the underlying C compiler set to gcc. PAPI include and library paths, and a suitable set of hardware counters for your system's CPUs. build : cd src ./configure --with-hpm=core --with-papi=/path/to/papi or ./configure --with-hpm=uncore --with-papi=/path/to/papi make libmpihpm.so typical use : export LD_PRELOAD=/path/to/libmpihpm.so mpirun --np 2048 your.exe unset LD_PRELOAD ================================================================================ Tool : libhpmprof.so ; directory = hpmprof purpose : Provides the same MPI information as libmpitrace.so, plus enables interrupt-based program sampling via hardware counters. This library is the preferred method for program sampling for systems that enable user-level access to hardware counters. outputs : text files : mpi_profile.jobid.rank ... MPI data binary files : hpm_histogram.jobid.rank ... pc sampling data requires : mpicc with the underlying C compiler set to gcc. PAPI include and library paths with a suitable set of hardware counters for your system's CPUs, and GNU binutils development files. build : cd hpmprof ./configure --with-binutils=/path/to/binutils --with-papi=/path/to/papi make libhpmprof.so typical use : export LD_PRELOAD=/path/to/libhpmprof.so mpirun --np 2048 your.exe unset LD_PRELOAD bfdprof your.exe hpm_histogram.jobid.rank > source_profile.txt annotate_objdump your.exe hpm_histogram.jobid.rank > asm_profile.txt ================================================================================ Tools : bfdprof and annotate_objdump ; directory = bfdprof purpose : These tools are required to analyze outputs generated by either of the program-sampling methods. The bfdprof utility provides function and statement-level profile data, and the annotate_objdump utility provides profile data at the assembly level. outputs : text files requires : GNU binutils development files. build : cd bfdprof ./configure --with-binutils=/path/to/binutils make typical use : bfdprof your.exe hpm_histogram.jobid.rank >source_profile.txt annotate_objdump your.exe hpm_histogram.jobid.rank > asm_profile.txt ================================================================================ Alternate builds of libmpitrace.so directory : ctx Adds the ability to separately report MPI profile data from different code regions. The user must annotate the source code and mark start/stop boundaries for each code block of interest. directory : nvtx Adds NVIDIA nvtx range markers around entry and exit of each MPI function for graphical display using NVIDIA's visual profiling tools. This is intended to add insight into the timelines for MPI calls along with GPU kernel execution.