Optimize diagonal matrix multiplication using hardware counters. Contained are two folders:
- PartA: single-threaded and multi-threaded program.
- PartB: GPU program.
Each folder contains two sub-folders, a Makefile, and a main program.
- Makefile: Contains commands necessary to compile, generate inputs, and run the program.
- data folder: Contains program that generates input, and will contain input once generated.
- header folder: Files containing the function that performs the operation.
- main.cpp: Program that takes inputs and executes the functions.
Navigate to each folder to begin setting up the system. Inside each folder do the following:
Use the following command to compile the programs and generate required input:
make
You can use make to run the executable with the following command:
make run
Alternatively, you can manually run the program for the different input sets using the following commands:
./diag_mult data/input_4096.in
./diag_mult data/input_8192.in
./diag_mult data/input_16384.in
To compile the code for use on native GPU use the following command:
make server
For use with GPGPU-Sim, additional flags are required during compilation, which can be done with the following command:
make sim
You can use make to run the executable with the following command for native execution:
make run_server
When running on GPGPU-Sim, use the following command instead:
make run_sim