Hopscotch v1.0 ============== Author: Alif Ahmed email: alihahmed@virginia.edu I. Overview Hopscotch is a micro-benchmark suite for memory performace evaluation. Currently it supports empirically measuring roofline, bandwidth measurement with different access patterns and latency measurement. GPU is also supported for roofline. A memory access pattern visualizer (MAPProfiler) is also included with Hopscotch. MAPProfiler currently supports only x86_64 executables. It is under active development. Latest source codes can be found on: https://github.com/alifahmed/hopscotch Details about most of the kernels and benchmark can be found in the paper: A. Ahmed, K. Skadron, "Hopscotch: A Micro-benchmark Suite for Memory Performance Evaluation", MEMSYS, 2019. II. Directory Structure hopscotch/ | |---cpu/ Directory containing benchmarks for memory connected to CPU. | | | |---1_roofline/ Roofline benchmark (CPU version). | |---2_bandwidth/ Benchmarks memory with different access patterns. | |---3_latency/ Latency benchmark. | |---4_cache/ Benchmark for evaluating caching. | |---common/ Source code common to all benchmarks. | |---include/ Common header files. | |---kernels/ Common kernels. Used by different benchmarks. | |---gpu/ | | | |---1_roofline/ Roofline benchmark (CUDA version). | |---common/ Source code common to all benchmarks. | |---include/ Common header files. | |---MAPProfiler/ A tool for memory access pattern visualization. | |---Makefile Top level build file. |---README This file. More details can be found on the README files inside these directories. II. Prerequisite a) Python 3 b) MAPProfiler requires Intel Pin Tool. Check the README inside MAPProfiler for details. III. Installation Running make from top directory will build all the sub-directories. Binaries are created inside the respective benchmark's directory. Make can also be run inside a benchmark's directory to build just that specific benchmark. Some benchmarks will use scripts to rebuild the binaries with different configurations. IV. CPU Benchmarks 1_roofline ========== Measures the maximum attainable performance with varying arithmetic intensity and the machine balance. To run: ./roofline.py The python script will generate a pdf for the roofline plot. Available options can be found using ./roofline.py --help 2_bandwidth =========== Measures bandwidth with different types of access patterns. To run: a) make b) ./bandwidth Working set size can be changed by defining WSS_EXP. Number of elements in the working set is (2 ^ WSS_EXP). WSS_EXP can be defined directy if manually compiling, or can be passes with USER_DEFS. Example: a) make USER_DEFS="-DWSS_EXP=32" b) ./bandwidth 3_latency ========== Measures the latency with a single threaded pointer chasing kernel. Working set size is varied. To run: ./latency.py The python script will generate a pdf for the latency plot. Available options can be found using ./latency.py --help 4_cache =========== Measures cache efficiency by running workloads with different spatial and temporal locality ({low,low}, {low,high}, {high,low}, {high,high}) To run: a) make b) ./cache IV. GPU Benchmarks 1_roofline ========== Measures the maximum attainable performance with varying arithmetic intensity and the machine balance. Supports single and double precision floating point operations. To run: ./roofline.py The python script will generate a pdf for the roofline plot. Available options can be found using ./roofline.py --help V. Acknoledgement This work was supported by CRISP, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA, and Brookhaven National Laboratory.