CO-Optimizer: Code-level On-chip memory Optimizer

This project gives an opportunity to optimize space-limited on-chip memories (L1, shared memory) of your CUDA applications.

If you use or build on this tool, please cite the following papers.

Getting the Source Code and Building the On-chip Memory Optimizer

This is an example work-flow and configuration to get and build the Transpiler.

Tested with following setups
- Ubuntu 18.04, cmake-3.10.2, gcc/g++-5
- sudo apt install gcc-5 g++-5 cmake
- sudo apt install libboost-all-dev
- Benchmark -- PolyBench/GPU and Rodinia
Checkout llvm, clang, and co-optimizer
- llvm
  - git clone https://github.com/llvm-mirror/llvm
  - cd llvm
  - git checkout release_80
- clang
  - cd tools;
  - git clone https://github.com/llvm-mirror/clang
  - cd clang;
  - git checkout release_80
- co-optimizer
  - cd tools
  - git clone https://github.com/hjunkim/CO-Optimizer.git
Build them
- Add the CO-Optimizer repository to llvm/tools/clang/tools/CMakeLists.txt
  - add_clang_subdirectory(CO-Optimizer)
- cd ../../../../;mkdir build;cd build
- cmake -G "Unix Makefiles" ../llvm
- make -j 16;sudo make install

Throttling:
- --csize=<int> - : L1 cache size of the GPU (default: 32 KB)
- --nblks=<int> - : # of thread blocks per SM (default: 4 blks)
- --tbsize=<int> - : thread block size (default: 8 warps)
Preloading
- --prdsize=<string> - : set preloading size (default: 1)
- --tbsize=<string> - : set thread block size (default: 8 warps)