In this repo, we have analyzed 96 various versions of the blocked matrix multiplication programs on combinations of following factors.
- Matrix size: 2048, 8192
- Outer loop order (variants): ‘ijk’, ‘jik’, ‘jki’, ‘ikj’, ‘kij’, ‘kji’
- Tile size: 4, 8, 16, 32, 64, 128, 256, 512
Analysis contains various performance counters including but not limited to cache-misses, TLB misses, cpu cycles, branch instructions, etc. We use perforator CLI tool for analysis.
curl https://zyedidia.github.io/eget.sh | sh ./eget zyedidia/perforator
pip3 install regex pandas
out2048/
out8192/
csv/
- genereate .out files in out2048/
python3 outgen2048.py
- generate csv files in csv/
python3 execout2048.py
- genereate .out files in out2048/
python3 outgen8192.py
- generate csv files in csv/
python3 execout8192.py
sudo ./perforator -e cpu-cycles,branch-misses,branch-instructions,cache-misses,dtlb-read-accesses,dtlb-read-accesses,dtlb-read-misses,dtlb-write-misses,l1d-read-accesses,l1d-read-misses,l1d-write-accesses,l1i-read-misses,ll-read-accesses,ll-read-misses,ll-write-accesses,ll-write-misses -r multiplication --csv ./out2048/v3_2048_32 > csv/v3_2048_32.csv