WIICA: Workload ISA-Independent Characterization for Applications v1.0 Public Release ================================================================= WIICA is a workload characterization tool to characterize the ISA-independent characteristics of applications in the context of specialized architectures. If you use WIICA in your research, please cite: ISA-Independent Workload Characterization and its Implications for Specialized Architectures, Yakun Sophia Shao and David Brooks, International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2013 ================================================================== 0. Build WIICA 1) LLVM 3.4 and Clang 3.4 64-bit 2) LLVM IR Trace Profiler (LLVM-Tracer) LLVM-Tracer is an LLVM compiler pass that instruments code in LLVM machine-independent IR. It prints out a dynamic trace of your program, which then be take as input for WIICA (and Aladdin.) You can download LLVM-Tracer from here: [https://github.com/ysshao/LLVM-Tracer] To build LLVM-Tracer, please follow the instructions in README.md in LLVM-Tracer. ================================================================= 1. Run WIICA: After you build LLVM-Tracer, you need to 1) Set environment variable TRACER_HOME to /path/to/your/LLVM-Tracer: ``` export TRACER_HOME=/path/to/your/LLVM-Tracer/ ``` 2) Specify the kernels that you want to instrument. For example, for Benchmark FFT, we want to instrument functions: fft1D_512, step1, step2, ..., step11. In the `compile.py` script, we already specify all the common functions in SHOC in the `kernel` dictionary. An example to run wiica is: cd scripts python run_wiica.py --directory /your/path/to/wiica/SHOC/fft/ --source fft --analysis_types memory ================================================================= Related scripts: 1) run_wiica.py The interface of wiica. usage: run_wiica.py [-h] [--directory DIRECTORY] [--source SOURCE] [--analysis_types [{opcode,staticinst,memory,branch,basicblock,register}]] optional arguments: -h, --help show this help message and exit --directory DIRECTORY ABSOLUTE directory of the benchmark --source SOURCE a list of source files with suffixes, e.g. fft.c, md.c, etc. --analysis_types [{opcode,staticinst,memory,branch,basicblock,register} ] Type of analysis. Separate multiple values with spaces. The supported analysis types are shown. 2) compile.py Compiling the program with LLVM-Tracer to generate a dynamic LLVM IR trace. 3) process_trace.py For those benchmarks with "llvm.memset" instrinsic. It replaces "llvm.memset" with several non-intrinsic instructions. 4) analysis.py Performing opcode,staticinst,memory,branch,basicblock,register analysis. Opcode: Opcode Breakdown into Compute, Memory, and Branch StaticInst: Number of dynamic executions for each static instruction, sorted by the dynamic counts Memory: Memory Footprint, Memory Global/Local Entropy[Shao2013] Branch: Branch Entropy[Shao2013] BasicBlock: Size and number of dynamic executions of each basic block 5) mem_analysis.py Spatial Locality Score, see [Weinberg2005] for more details. Temporal Locality Score, see [Weinberg2005] for more details. 6) reg_analysis.py Register Degrees: The averarge use of registers, equals to the total number of register read divided by the total number of register write, see [Franklin1992] for more details. Register Distribution: The distribution of the register dependency distance. Register Lifetime: The distribution of the distance between the creation and the last use of registers, see [Franklin1992] for more details. Register Number: The number of register required at a certain point. We assume the application is executed 1 instruction per cycle. ================================================================= 2.WIICA Outputs: Stats files are generated to store the results including: (These files are generated from analysis.py) [bench name]_opcode_profile [bench name]_staticinst_profile [bench name]_footprint Memory footprint [bench name]_mem_entropy [bench name]_branch_entropy [bench name]_basicblock_profile (These files are generated from mem_analysis.py) [bench name]_spatial_locality [bench name]_temporal_locality [bench name]_stride_profile Used to compute spatial locality [bench name]_reuse_profile Used to compute temporal locality (These files are generated from reg_analysis.py) [bench name]_reg_degree total read / total write [bench name]_reg_distribution dependency distance distribution [bench name]_reg_lifetime the distance (between when the register is created with when it is used for the last time) distribution [bench name]_reg_number The dynamic register number needed at each cycle (assume 1 cycle / instruction) [bench name]_reg_maxn The maximun number in [bench name]_reg_number, which is the minimun number of registers needed to run the program ================================================================= 4. Feedback Feel free to leave us a message on github if you have any questions or comments. ================================================================= Yu Emma Wang, Sophia Yakun Shao VLSI-Arch group Harvard University July 26, 2014 ================================================================= References: [Weinberg2005] J. Weinberg, M.O. McCracken, E. Strohmaier, and A. Snavely. Quantifying Locality in the Memory Access Patterns of HPC Applications, SC, 2005 [Shao2013] Y.S. Shao and D. Brooks. ISA-Independent Workload Characterization and its Implications for Specialized Architecture, ISPASS, 2013 [Franklin1992] Franklin, M., & Sohi, G. S. Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors. In ACM SIGMICRO Newsletter (Vol. 23, No. 1-2, pp. 236-245). IEEE Computer Society Press, 1992.