Table of Contents
Pythia is a hardware-realizable, light-weight data prefetcher that uses reinforcement learning to generate accurate, timely, and system-aware prefetch requests.
Pythia formulates hardware prefetching as a reinforcement learning task. For every demand request, Pythia observes multiple different types of program context information to take a prefetch decision. For every prefetch decision, Pythia receives a numerical reward that evaluates prefetch quality under the current memory bandwidth utilization. Pythia uses this reward to reinforce the correlation between program context information and prefetch decision to generate highly accurate, timely, and system-aware prefetch requests in the future.
Pythia is presetend at MICRO 2021.
Rahul Bera, Konstantinos Kanellopoulos, Anant V. Nori, Taha Shahroodi, Sreenivas Subramoney, Onur Mutlu, "Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning", In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2021
Pythia is implemented in ChampSim simulator. We have significantly modified the prefetcher integration pipeline in ChampSim to add support to a wide range of prior prefetching proposals mentioned below:
- Stride [Fu+, MICRO'92]
- Streamer [Chen and Baer, IEEE TC'95]
- SMS [Somogyi+, ISCA'06]
- AMPM [Ishii+, ICS'09]
- Sandbox [Pugsley+, HPCA'14]
- BOP [Michaud, HPCA'16]
- SPP [Kim+, MICRO'16]
- Bingo [Bakshalipour+, HPCA'19]
- SPP+PPF [Bhatia+, ISCA'19]
- DSPatch [Bera+, MICRO'19]
- MLOP [Shakerinava+, DPC-3'19]
- IPCP [Pakalapati+, ISCA'20]
Most of the prefetchers (e.g., SPP [1], Bingo [2], IPCP [3]) reuse codes from 2nd and 3rd data prefetching championships (DPC). Others (e.g., AMPM [4], SMS [5]) are implemented from scratch and shows similar relative performance reported by previous works.
The infrastructure has been tested with the following system configuration:
- G++ v6.3.0 20170516
- CMake v3.20.2
- md5sum v8.26
- Perl v5.24.1
- [DEPRECATED]
Megatools 1.11.0 (Note that v1.9.98 does NOT work)
-
Install necessary prequisites
sudo apt install perl
-
Clone the GitHub repo
git clone https://github.com/CMU-SAFARI/Pythia.git
-
Clone the bloomfilter library inside Pythia home directory
cd Pythia git clone https://github.com/mavam/libbf.git libbf
-
Build bloomfilter library. This should create the static
libbf.a
library insidebuild
directorycd libbf mkdir build && cd build cmake ../ make clean && make
-
Build Pythia for single/multi core using build script. This should create the executable inside
bin
directory.cd $PYTHIA_HOME # ./build_champsim.sh <l1_pref> <l2_pref> <llc_pref> <ncores> ./build_champsim.sh multi multi no 1
Please use
build_champsim_highcore.sh
to build ChampSim for more than four cores. -
Set appropriate environment variables as follows:
source setvars.sh
Update on May 22, 2024: The megatool-based trace distribution framework has been deprecated due to stability issues. We have uploaded the Ligra and PARSEC 2.1 traces in the Google Drive (links below). Please download from these links, till we find a better alternative to Mega.
-
[DEPRECATED]
Install the megatools executablecd $PYTHIA_HOME/scripts wget https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-linux-x86_64.tar.gz tar -xvf megatools-1.11.1.20230212-linux-x86_64.tar.gz
Note: The megatools link might change in the future depending on latest release. Please recheck the link if the download fails.
-
Use the
download_traces.pl
perl script to download necessary ChampSim traces used in our paper.mkdir $PYTHIA_HOME/traces/ cd $PYTHIA_HOME/scripts/ perl download_traces.pl --csv artifact_traces.csv --dir ../traces/
Note: The script should download 233 traces. Please check the final log for any incomplete downloads. The total size of all traces would be ~52 GB.
Update on May 22, 2024: Ligra and PARSEC traces may fail to download. Please use the new Google Drive links mentioned below for downloading.
-
Once the trace download completes, please verify the checksum as follows. Please make sure all traces pass the checksum test.
cd $PYTHIA_HOME/traces md5sum -c ../scripts/artifact_traces.md5
-
If the traces are downloaded in some other path, please change the full path in
experiments/MICRO21_1C.tlist
andexperiments/MICRO21_4C.tlist
accordingly.
-
We are also releasing a new set of ChampSim traces from PARSEC 2.1 and Ligra. The trace drop-points are measured using Intel Pinplay and the traces are captured by the ChampSim PIN tool. The traces can be found in the following links. To download these traces in bulk, please use the "Download as ZIP" option from mega.io web-interface.
-
Our simulation infrastructure is completely compatible with all prior ChampSim traces used in CRC-2 and DPC-3. One can also convert the CVP-2 traces (courtesy of Qualcomm Datacenter Technologies) to ChampSim format using the following converter. The traces can be found in the follwing websites:
- CRC-2 traces: http://bit.ly/2t2nkUj
- DPC-3 traces: http://hpca23.cse.tamu.edu/champsim-traces/speccpu/
- CVP-2 traces: https://www.microarch.org/cvp1/cvp2/rules.html
Our experimental workflow consists of two stages: (1) launching experiments, and (2) rolling up statistics from experiment outputs.
-
To create necessary experiment commands in bulk, we will use
scripts/create_jobfile.pl
-
create_jobfile.pl
requires three necessary arguments:exe
: the full path of the executable to runtlist
: contains trace definitionsexp
: contains knobs of the experiements to run
-
Create experiments as follows. Please make sure the paths used in tlist and exp files are appropriate.
cd $PYTHIA_HOME/experiments/ perl ../scripts/create_jobfile.pl --exe $PYTHIA_HOME/bin/perceptron-multi-multi-no-ship-1core --tlist MICRO21_1C.tlist --exp MICRO21_1C.exp --local 1 > jobfile.sh
-
Go to a run directory (or create one) inside
experiements
to launch runs in the following way:cd experiments_1C source ../jobfile.sh
-
If you have slurm support to launch multiple jobs in a compute cluster, please provide
--local 0
tocreate_jobfile.pl
-
To rollup stats in bulk, we will use
scripts/rollup.pl
-
rollup.pl
requires three necessary arguments:tlist
exp
mfile
: specifies stat names and reduction method to rollup
-
Rollup statistics as follows. Please make sure the paths used in tlist and exp files are appropriate.
cd experiements_1C/ perl ../../scripts/rollup.pl --tlist ../MICRO21_1C.tlist --exp ../MICRO21_1C.exp --mfile ../rollup_1C_base_config.mfile > rollup.csv
-
Export the
rollup.csv
file in you favourite data processor (Python Pandas, Excel, Numbers, etc.) to gain insights.
We also implement Pythia in Chisel HDL to faithfully measure the area and power cost. The implementation, along with the reports from umcL65 library, can be found the following GitHub repo. Please note that the area and power projections in the sample report is different than what is reported in the paper due to different technology.
Pythia was code-named Scooby (the mistery-solving dog) during the developement. So any mention of Scooby anywhere in the code inadvertently means Pythia.
- The top-level files for Pythia are
prefetchers/scooby.cc
andinc/scooby.h
. These two files declare and define the high-level functions for Pythia (e.g.,invoke_prefetcher
,register_fill
, etc.). - The released version of Pythia has two types of RL engine defined: basic and featurewise. They differ only in terms of the QVStore organization (please refer to our paper to know more about QVStore). The QVStore for basic version is simply defined as a two-dimensional table, whereas the featurewise version defines it as a hierarchichal organization of multiple small tables. The implementation of respective engines can be found in
src/
andinc/
directories. inc/feature_knowledge.h
andsrc/feature_knowldege.cc
define how to compute each program feature from the raw attributes of a deamand request. If you want to define your own feature, extend the enumFeatureType
ininc/feature_knowledge.h
and define its correspondingprocess
function.inc/util.h
andsrc/util.cc
contain all hashing functions used in our evaluation. Play around with them, as a better hash function can also provide performance benefits.
If you use this framework, please cite the following paper:
@inproceedings{bera2021,
author = {Bera, Rahul and Kanellopoulos, Konstantinos and Nori, Anant V. and Shahroodi, Taha and Subramoney, Sreenivas and Mutlu, Onur},
title = {{Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning}},
booktitle = {Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture},
year = {2021}
}
Distributed under the MIT License. See LICENSE
for more information.
Rahul Bera - write2bera@gmail.com
We acklowledge support from SAFARI Research Group's industrial partners.