/WIICA

Workload ISA-Independent Characterization of Applications

Primary LanguageCOtherNOASSERTION

WIICA: Workload ISA-Independent Characterization for Applications

v1.0 Public Release
=================================================================
WIICA is a workload characterization tool to characterize the ISA-independent
characteristics of applications in the context of specialized architectures.

If you use WIICA in your research, please cite:

ISA-Independent Workload Characterization and its Implications for Specialized
Architectures,
Yakun Sophia Shao and David Brooks,
International Symposium on Performance Analysis of Systems and Software
(ISPASS), April 2013

==================================================================
0. Build WIICA

1) LLVM 3.4 and Clang 3.4 64-bit
2) LLVM IR Trace Profiler (LLVM-Tracer)
   LLVM-Tracer is an LLVM compiler pass that instruments code in LLVM
   machine-independent IR. It prints out a dynamic trace of your program, which
   then be take as input for WIICA (and Aladdin.)

   You can download LLVM-Tracer from here:
   [https://github.com/ysshao/LLVM-Tracer]

   To build LLVM-Tracer, please follow the instructions in README.md in
   LLVM-Tracer.
=================================================================
1. Run WIICA:
After you build LLVM-Tracer, you need to
1) Set environment variable TRACER_HOME to /path/to/your/LLVM-Tracer:
   ```
   export TRACER_HOME=/path/to/your/LLVM-Tracer/
   ```
2) Specify the kernels that you want to instrument. For example, for Benchmark
FFT, we want to instrument functions: fft1D_512, step1, step2, ..., step11. In
the `compile.py` script, we already specify all the common functions in SHOC in
the `kernel` dictionary.

An example to run wiica is:
  cd scripts
  python run_wiica.py --directory /your/path/to/wiica/SHOC/fft/ --source fft --analysis_types memory

=================================================================

Related scripts:
1) run_wiica.py
	The interface of wiica.
	usage: run_wiica.py [-h] [--directory DIRECTORY]
                    [--source SOURCE]
                    [--analysis_types [{opcode,staticinst,memory,branch,basicblock,register}]]

optional arguments:
  -h, --help            show this help message and exit
  --directory DIRECTORY
                        ABSOLUTE directory of the benchmark
  --source SOURCE       a list of source files with suffixes, e.g. fft.c, md.c, etc.
  --analysis_types [{opcode,staticinst,memory,branch,basicblock,register} ]
                        Type of analysis. Separate multiple values with
                        spaces. The supported analysis types are shown.

2) compile.py
  Compiling the program with LLVM-Tracer to generate a dynamic LLVM IR trace.

3) process_trace.py
    For those benchmarks with "llvm.memset" instrinsic. It replaces "llvm.memset" with several non-intrinsic instructions.

4) analysis.py
   Performing opcode,staticinst,memory,branch,basicblock,register analysis.
   Opcode: Opcode Breakdown into Compute, Memory, and Branch
   StaticInst: Number of dynamic executions for each static instruction, sorted
               by the dynamic counts
   Memory: Memory Footprint, Memory Global/Local Entropy[Shao2013]
   Branch: Branch Entropy[Shao2013]
   BasicBlock: Size and number of dynamic executions of each basic block

5) mem_analysis.py
   Spatial Locality Score, see [Weinberg2005] for more details.
   Temporal Locality Score, see [Weinberg2005] for more details.

6) reg_analysis.py
   Register Degrees: The averarge use of registers, equals to the total number of register read divided by the total number of register write, see [Franklin1992] for more details.
   Register Distribution: The distribution of the register dependency distance.
   Register Lifetime: The distribution of the distance between the creation and the last use of registers, see [Franklin1992] for more details.
   Register Number: The number of register required at a certain point. We assume the application is executed 1 instruction per cycle.

=================================================================
2.WIICA Outputs:
	Stats files are generated to store the results including:

	(These files are generated from analysis.py)
	[bench name]_opcode_profile
	[bench name]_staticinst_profile
	[bench name]_footprint			Memory footprint
	[bench name]_mem_entropy
	[bench name]_branch_entropy
	[bench name]_basicblock_profile

	(These files are generated from mem_analysis.py)
	[bench name]_spatial_locality
	[bench name]_temporal_locality
	[bench name]_stride_profile		Used to compute spatial locality
	[bench name]_reuse_profile		Used to compute temporal locality

	(These files are generated from reg_analysis.py)
	[bench name]_reg_degree			total read / total write
	[bench name]_reg_distribution		dependency distance distribution
	[bench name]_reg_lifetime		the distance (between when the register is created with when it is used for the last time) distribution
	[bench name]_reg_number			The dynamic register number needed at each cycle (assume 1 cycle / instruction)
	[bench name]_reg_maxn			The maximun number in [bench name]_reg_number, which is the minimun number of registers needed to run the program

=================================================================
4. Feedback

Feel free to leave us a message on github if you have any questions or
comments.

=================================================================

Yu Emma Wang, Sophia Yakun Shao
VLSI-Arch group
Harvard University
July 26, 2014

=================================================================

References:
[Weinberg2005] J. Weinberg, M.O. McCracken, E. Strohmaier, and A. Snavely.
Quantifying Locality in the Memory Access Patterns of HPC Applications, SC, 2005
[Shao2013] Y.S. Shao and D. Brooks.
ISA-Independent Workload Characterization and its Implications for Specialized
Architecture, ISPASS, 2013
[Franklin1992] Franklin, M., & Sohi, G. S.
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors.
In ACM SIGMICRO Newsletter (Vol. 23, No. 1-2, pp. 236-245). IEEE Computer Society Press, 1992.