- Analysed the data criticality and locality, i.e. cache lines that read versus the cache lines that are only written and exploited the disparity between the read and the write criticality using the proposed Read Write Partitioning, to divide the LLC into two logical partitions for dirty and clean lines. - Implemented the Read Write Partitioning to predict the best partition sizes to increase the likelihood of the future read hits in Simple-Scalar 3.0 and analysed the results of benchmarks of the SPEC CPU2000 suite. - Attained performance improvement in terms of Miss rate reduction by 10% and memory traffic reduction by 20 %. - Compared RWP with the prior state of the art cache management policies and found that RWP outperforms all of them, apart from few outliers. The analysis of the outliers was also performed.
Running the RWP The project was executed on the Simple-scaler using the SPEC2000 integer and floating point benchmarks. To execute following files were edited/hacked: cache.c and cache.h :- As mentioned in the Appendix of the report the changes are made to accommodate the read-write partitioning. We have added an additional switch to run the read-write partitioning (RWP), w analogous to l for LRU. The command that were used for the running the read-write partitioning is:
./RUN$1 ../../simplesim-3.0/sim-outorder ../../spec2000binaries/$1*.peak.ev6 -max:inst 50000000 -fastfwd 20000000 -redir:sim $1_sim_output_rwp_32.log -bpred bimod -bpred:bimod 256 -bpred:ras 8 -bpred:btb 64 2 -cache:dl1 dl1:128:32:4:w -cache:dl2 ul2:1024:64:4:w where, w is for read-write partitioning
Running the RWP: To run the RWP with standard Simple-scaler, two files cache.c and cache.h needs to be replaced with our provided version of files and need to be re-compiled with alpha version. Then the use of the above command will execute the RWP for the selected benchmark ($1).
The data that was collected for the report and analysis is placed in a single spread-sheet named “RWP.xls” and will be a part of the compressed deliverable.
To get the trace of the memory operation we used simple scaler setup from the github, mentioned in the references for getting the information of load/store operation in the benchmarks.