CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5D, and 3D Processor-Memory Systems
With the growing power density in both cores and memories (esp. 3D), thermal issues significantly impact performance and reliability. Thus, increasingly researchers have become interested in understanding the performance, power, and thermal effects of the proposed changes in hardware and software. CoMeT is an integrated Core and Memory Thermal simulation toolchain, providing performance, power, and temperature parameters at regular intervals (epoch) for both cores and memory. It enables computer architects to evaluate various core and main memory integration options (3D, 2.5D, 2D) and analyze runtime management policies.
CoMeT extends the Sniper multicore performance simulator's source code to provide DRAM access information per memory bank (at regular intervals). It emits the access count for reads and writes separately, which can be helpful for memories having asymmetric read/write energy and delay (e.g., NVM). Periodically, using McPAT and CACTI, the core and memory power are computed and fed to HotSpot for (temperature-dependent leakage power-aware) thermal analysis. A thermal management policy monitors the temperature and, in the case of core or memory heating, it redistributes/reduces the power, then the performance simulation is resumed.
Following are the salient features:
- Supports various main memory types and their integration to cores (2D off-chip DDR, 3D off-chip memory, 2.5D integration, and 3D stacking of core and memory).
- Has a built-in temperature video generation tool, namely HeatView, which supports all core-memory configurations. Additionally, for 3D architectures, a video with a layer-wise 2D view is generated.
- A default thermal management policy with an OnDemand governer and open scheduler is included to quick-start the design process. Designers can easily modify the default policy and evaluate different thermal management approaches.
- To ease user development and reduce debugging, CoMET provides an automatic build verification test suite (smoke testing) that checks critical functionalities across various architectures. Users can easily add test cases to the smoke tests.
- Provides an automated grid-based floorplan generator (floorplanlib), which supports the generation of 2D, 2.5D, and 3D floorplans.
- Supports PARSEC, SPLASH-2, and SPEC CPU2017 benchmark suites. Users can also run their benchmarks.
- Using the SimulationControl feature, users can run simulations in batch mode, taking the list of workloads (mixes of benchmarks) and configurations as input. Further, to enable detailed output analysis, SimulationControl generates additional outputs, such as performance, power, temperature variation (versus time) graphs, and detailed CPI bar charts.
CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5 D, and 3D Processor-Memory Systems
Details of CoMeT can be found in our TACO 2022 paper, and please consider citing this paper in your work if you find this tool useful in your research.
Lokesh Siddhu, Rajesh Kedia, Shailja Pandey, Martin Rapp, Anuj Pathania, Jörg Henkel, and Preeti Ranjan Panda. "CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5 D, and 3D Processor-Memory Systems". "ACM Transactions on Architecture and Code Optimization" Volume 19 Issue 3 Article No.: 44 pp 1–25 https://doi.org/10.1145/3532185.
Please refer to CoMeT User Manual to learn how to write custom scheduling policies that perform thermal-aware Dynamic Voltage Frequency Scaling (DVFS), Memory Low-Power Mode, Task Mappings, and Task Migrations.
sudo apt install git make python gcc
git clone https://github.com/marg-tools/CoMeT.git
Download and extract Pinplay 3.2 to the root CoMeT directory as pin_kit
wget --user-agent="Mozilla" https://www.intel.com/content/dam/develop/external/us/en/protected/pinplay-drdebug-3.2-pin-3.2-81205-gcc-linux.tar.gz
tar xf pinplay-drdebug-3.2-pin-3.2-81205-gcc-linux.tar.gz
mv pinplay-drdebug-3.2-pin-3.2-81205-gcc-linux pin_kit
CoMeT compiles and runs inside a Docker container. Therefore, we need to download & install Docker. For more info: https://docs.docker.com/engine/install/ubuntu/
After installing Docker, let us now create a container
using the Dockerfile
.
cd docker
make # build the Docker image
make run # starts running the Docker image. Please ignore "docker groups: cannot find name for group id 1000"
cd .. # return to the base Sniper directory (while running inside of Docker)
make # or use 'make -j N' where N is the number of cores in your machine to use parallel make
Let us compile the [HotSpot] simulator, which shipped with CoMeT.
cd hotspot_tool/
make
cd ..
cd test/thermal_example
make run | tee logfile # Runs application, displays DRAM bank accesses, outputs temperature files
-
The output of
make run
displays the time interval or epoch (in µs) in which DRAM access was made, #reads and #writes, and reports the number of DRAM accesses directed to a particular bank. Further, detailed power, temperature traces at epoch level are generated. -
To enable the above performance, power, and temperature outputs, we have added
-s memTherm_core
and-c gainestown_3D
in the Sniper run command (please see Makefile). The above flags can be used to enable CoMeT simulation for any Sniper compatible executable. -
Sample output: Apart from Sniper messages and command line, we see a detailed bank-level trace for DRAM accesses. Please note the terminal output with the default epoch of 1 ms (= 1000 µs) shown below.
Time #READs #WRITEs #Access Address #BANK Bank Counters
@& 1000 10455 8710 19165 144, 132, 151, 162, 149, 160, 144, 130, 145, 140, 143, 164, 147, 158, 145, 133, 142, 131, 148, 156, 144, 155, 140, 134, 147, 129, 143, 162, 147, 167, 139, 129, 140, 130, 156, 155, 144, 153, 144, 138, 156, 137, 155, 157, 150, 169, 145, 142, 152, 137, 156, 157, 144, 156, 138, 136, 147, 127, 142, 160, 147, 160, 142, 129, 138, 133, 151, 156, 145, 155, 143, 135, 145, 129, 144, 157, 143, 162, 143, 130, 144, 129, 149, 170, 147, 164, 144, 128, 145, 132, 144, 155, 149, 164, 146, 133, 275, 254, 280, 282, 143, 163, 150, 134, 152, 125, 146, 166, 141, 164, 143, 126, 142, 130, 146, 153, 139, 156, 144, 136, 150, 126, 139, 156, 148, 165, 148, 130,
@& 2000 15742 12212 27954 206, 188, 225, 249, 240, 267, 197, 164, 229, 219, 201, 225, 193, 196, 244, 235, 205, 191, 226, 246, 241, 264, 196, 167, 229, 217, 202, 220, 193, 196, 244, 235, 205, 191, 226, 246, 241, 264, 196, 167, 236, 218, 208, 225, 196, 205, 248, 240, 212, 193, 233, 251, 241, 267, 197, 165, 230, 215, 202, 223, 193, 199, 245, 233, 206, 189, 226, 249, 241, 267, 197, 165, 230, 220, 202, 218, 188, 202, 250, 230, 211, 196, 223, 251, 241, 265, 200, 170, 229, 222, 203, 216, 190, 203, 255, 236, 215, 193, 231, 250, 244, 264, 199, 167, 234, 215, 197, 229, 194, 196, 244, 236, 204, 191, 228, 247, 242, 264, 196, 168, 233, 211, 199, 227, 196, 200, 249, 239,
.
.
.
.
Total number of DRAM read requests = 48989
Total number of DRAM write requests = 32774
-
Sum of DRAM read requests and write requests equals num dram accesses in sim.out file.
- You can also specify --roi flag in config file to obtain DRAM access trace for a region of interest.
-
Selected useful files: Multiple files containing simulation outputs will be generated (sim.cfg, sim.out, etc.), but the useful ones are described below, these files would have _mem and_core suffix (instead of prefix combined_) to indicate if they are for memory or core temperature simulation:
- combined_temperature.trace - the temperature trace of core and memory at periodic intervals combined together.
- combined_power.trace - the power trace of core and memory at periodic intervals combined together.
- full_temperature.trace (core and mem) - the temperature trace at periodic intervals for various banks and logic cores in the 3D memory. core trace is not generated in case of a 2.5D and 3D architecture.
- logfile - the simulation output from the terminal. bank_access_counter lists the access counts for different banks.
If you are able to verify this, then you have successfully run an application.
Click here to open details
CoMeT can be configured for various memory and core configurations.
We show changing input configuration, from stacked (core + 3D memory) to off-chip 3D memory, for the thermal_example test case.
#Change to appropriate working directory
cd test/thermal_example
#Change configuration from gainestown_3D to gainestown_3Dmem. Can be done in a text editor also.
sed -i 's/-c gainestown_3D/-c gainestown_3Dmem/g' Makefile
#Running CoMeT
make run > logfile
-
Setting up input configuration: Open Makefile and change the config file used (specified with -c option in the sniper command). The options are as follows:
- gainestown_DDR - 2x2 core and an external 4x4 bank DDR main memory (2D memory).
- gainestown_3Dmem - 2x2 core and an external 4x4x8 banks 3D main memory.
- gainestown_2_5D - 2x2 core and a 4x4x8 banks 3D main memory integrated on the same die (2.5D architecture).
- gainestown_3D - 2x2 core on top of a 4x4x8 banks 3D main memory.
Click here to open details
- To generate the thermal trace video (for stacked 4-core and 3D, 8 layer, 128 bank memory architechure), please run
python3 ../../../scripts/heatView.py --cores_in_x 2 --cores_in_y 2 --cores_in_z 1 --banks_in_x 4 --banks_in_y 4 --banks_in_z 8 --arch_type 3D --traceFile combined_temperature.trace --output maps
. The video will be an avi file generated in the maps folder using the combined_temperature.trace. Detailed command line arguments for HeatView are given below.
Usage: python3 heatView.py arguments
Switches and command-line arguments:
--cores_in_x: Number of cores in x dimension (default 4)
--cores_in_y: Number of cores in y dimension (default 4)
--cores_in_z: Number of cores in z dimension (default 1)
--banks_in_x: Number of memory banks in x dimension (default 4)
--banks_in_y: Number of memory banks in y dimension (default 4)
--banks_in_z: Number of memory banks in z dimension (default 8)
--arch_type: Architecture type = 3D or no3D (default no3D)
--plot_type: Generated view = 3D or 2D (default 3D)
--layer_to_view: Layer number to view in 3D plot (starting from 0) (default 0)
--type_to_view: Layer type to view in 3D plot (CORE or MEMORY) (default MEMORY)
--verbose (or -v): Enable verbose output
--inverted_view (or -i): Enable inverted view (heat sink on bottom)
--debug: Enable debug priting
--tmin: Minimum temperature to use for scale (default 65 deg C)
--tmax: Maximum temperature to use for scale (default 81 deg C)
--samplingRate (or -s): Sampling rate, specify an integer (default 1)
--traceFile (or -t): Input trace file (no default value)
--output (or -o): output directory (default maps)
--clean (or -c): Clean if directory exists
Click here to open details
Open Scheduler
- features
- random arrival times of workloads (open system)
- API for application mapping and DVFS policies
- enable with
type=open
in base.cfg
Configuration Help for Open Scheduler
- task arrival times: use the config parameters in
scheduler/open
inbase.cfg
- mapping: select logic with
scheduler/open/logic
and configure with additional parameters (core_mask
,preferred_core
) - DVFS: select logic with
scheduler/open/dvfs/logic
and configure accordingly
These policies are implemented in common/scheduler/policies
.
Mapping policies derive from MappingPolicy
, DVFS policies derive from DVFSPolicy
.
After implementing your policy, instantiate it in SchedulerOpen::initMappingPolicy
/ SchedulerOpen::initDVFSPolicy
.
Click here to open details
- Running automated test suite to ensure working of different features of CoMeT
cd test/test-installation
make run
-
As each system configuration is successfully simulated, you will see messages as below
- Running test case with configuration gainestown_3D
- Finished running test case with configuration gainestown_3D.cfg
- Test case passed for configuration gainestown_3D.cfg
- OR Test case failed for configuration gainestown_3D.cfg. Please check test/test-installation/comet_results/gainestown_3D/error_log for details.
- Video for gainestown_3D saved in test/test-installation/comet_results/gainestown_3D/maps
- OR Video generation failed for configuration gainestown_3D.cfg. Check test/test-installation/comet_results/gainestown_3D/video_gen_error.log for details.
- Result saved in test/test-installation/comet_results/gainestown_3D
- make clean
-
After the test finishes successfully, a folder "comet_results" will be created in the same folder
- It contains sub-folders, one for each system configurations (DDR, 3Dmem, 3D and 2_5_D)
- Each sub-folder contains architecture simulation files and thermal simulation files for the test case
- For per epoch DRAM access trace and Sniper log of test case, please refer simulation_log file
- For thermal simulation results, please refer to full_temperature.trace file and other related files
-
Video generation
- If the simulation for a configuration finishes successfully and pre-requisites for generating videos are installed in your host machine, then the video is generated inside "video" folder of that configuration.
- If the simulation for a configuration crashes, no video is generated. Further, an error_log is generated for that configuration stating why simulation failed.
- If the simulation finishes successfully but pre-requisites for generating videos are not met, a file named video_gen_error.log is generated to report the error for that configuration.
-
Test summary
- The complete summary of the running the test suite is written to a file named test_summary.
- Also, some logs are printed during the execution of test_suite.
Click here to open details
The floorplan creation helpers are an optional tool, you can also use your custom floorplans instead. Usage:
- create floorplans (and layer configuration files, HotSpot configuration files)
- change configuration to reference to the created files (for an example see gainestown_*)
python3 floorplanlib/create.py \
--mode DDR \
--cores 4x4 --corex 1mm --corey 1mm \
--banks 8x8 --bankx 0.9mm --banky 0.9mm \
--out my_2d_floorplan
python3 floorplanlib/create.py \
--mode 3Dmem \
--cores 4x4 --corex 1mm --corey 1mm \
--banks 8x8x2 --bankx 0.9mm --banky 0.9mm \
--out my_3d_oc_floorplan
python3 floorplanlib/create.py \
--mode 2.5D \
--cores 4x4 --corex 1mm --corey 1mm \
--banks 8x8x2 --bankx 0.9mm --banky 0.9mm \
--core_mem_distance 7mm \
--out my_2.5d_floorplan
python3 floorplanlib/create.py \
--mode 3D \
--cores 4x4 --corex 0.9mm --corey 0.9mm \
--banks 8x8x4 --bankx 0.45mm --banky 0.45mm \
--out my_3d_floorplan
Click here to open details
#setting $GRAPHITE_ROOT to CoMeT's root directory
export GRAPHITE_ROOT=$(pwd)
cd benchmarks
#setting $BENCHMARKS_ROOT to the benchmarks directory
export BENCHMARKS_ROOT=$(pwd)
#compiling the benchmarks
make
#Running the benchmarks
make run
- You will see that compilation only passes for PARSEC and SPLASH benchmarks, and fails for SPEC benchmarks. Ignore the failed compilation for SPEC benchmarks.
- For the SPEC 2017 benchmarks,
- Download the pinballs from the below link
- Create a folder "SPEC" inside test folder
- Extract the pinballs inside test/SPEC
- Run the benchmark. Given below is an example for a 4-core simulation.
cd test/SPEC ../../../../../run-sniper -v -s memTherm_core -c gainestown_3Dmem -n 4 --pinballs $SIM_PATH,$SIM_PATH,$SIM_PATH,$SIM_PATH
- $SIM_PATH represents path of a specific .address for the SPEC benchmark
cd test/darknet
# Compiling the darknet source code
make
# Running 4 instances of AlexNet on gainestown_3Dmem architecture
# Download alexnet.weights (pre-trained model for AlexNet) from https://pjreddie.com/darknet/imagenet/
./run.sh
- Running the entire darknet source code would take days.
- Insert appropriate region of interest (ROI) markers in the source code, depending on the phase of DNN you want to simulate.
- Files of interest in darknet framework are inside src folder.
- Relevant functions -- load_network(), forward_network(), train_classifier(), try_classifier(), predict_classifier()
- Use SimRoiStart() and SimRoiEnd() to specify ROI.
Click here to open details
- features
- batch run many simulations with different configurations
- annotate configuration options in config files (e.g., in
base.cfg
orgainestown_3D.cfg
) with tags following the format# cfg:<TAG>
- specify list of tags per run in
run.py
. Only the associated configuration options will be enabled - for an example: see
example
function inrun.py
andscheduler/open/dvfs/constFreq
inbase.cfg
to run an application at different frequencies - IMPORTANT: make sure that all your configuration options have a match in
base.cfg
- annotate configuration options in config files (e.g., in
- create plots of temperature, power, etc. over time
- create video of temperature (with HeatView)
- API to automatically parse finished runs (
resultlib
)
- batch run many simulations with different configurations
- usage
- configure basic settings in
simulationcontrol/config.py
- specify your runs in
simulationcontrol/run.py
python3 run.py
- print overview of finished simulations:
python3 parse_results.py
- configure basic settings in
Quickly list the finished simulations:
cd simulationcontrol
PYTHONIOENCODING="UTF-8" python3 parse_results.py
Each run is stored in a separate directory in the results directory (see 4). For quick visual check, many plots are automatically generated for you (IPS, power, etc).
To do your own (automated) evaluations, see the simulationcontrol.resultlib
package for a set of helper functions to parse the results. See the source code of parse_results.py
for a few examples.
Click here to open details
CoMeT also allows running subcore level simulations if one is interested to find out the hot regions within a core. To enable subcore simulations in CoMeT, do the following:
- In
config/base.cfg
, there is a section named[core_power]
. Make everything as true except for the tp and l3 being false. These reflect various sub components of the core. - A single core floorplan template is at ./config/hotspot/3Dmem_subcore/template.flp. We use this template for our simulations. If needed, one can change the dimensions.
- The
floorplanlib
generates multi-core floorplan by repeating these subcore components. A default created floorplan is available (4-core, 3D-ext configuration) in./config/hotspot/3Dmem_subcore
- The extra switch to generate such multi-core floorplan is: --subcore-template config/hotspot/3Dmem_subcore/template.flp - A default config file is available at
config/gainestown_3Dmem_subcore.cfg
. Use this for the simulation (or create own as per the requirements) - PLEASE NOTE THAT heatview DOES NOT SUPPORT subcore video generation.
Sniper: http://snipersim.org
McPat: https://www.hpl.hp.com/research/mcpat/
CACTI: https://hpl.hp.com/research/cacti/
HotSpot: http://lava.cs.virginia.edu/HotSpot/
HotSniper: https://github.com/anujpathania/HotSniper
MatEx: http://ces.itec.kit.edu/846.php
thermallib: https://github.com/ma-rapp/thermallib
Darknet Open Source Neural Networks in C: https://pjreddie.com/darknet/