- Implemented 3.5D architectures and fixed minor bugs.
- Added visualization to illustrate mapping using
tile.map.png
. - Added setter functions and the code (
hisim_model.py
) now wraps and refactors theanaly_model.py
file. - Generated
PPA_new
, which uses a dictionary instead of a list (making it easier to track entries and better able to log bad configurations). - Labeled headers of
PPA.csv
with units. - Changed relative paths to absolute paths and modified system exits to return instead.
HISIM introduces a suite of analytical models at the system level to speed up performance prediction for AI models, covering logic-on-logic architectures across 2D, 2.5D, 3D and 3.5D integration.
The main directory structure of the repository is shown below. The file run_tb.py
contains example use cases for running the tool.
├── Debug -- Folder containing layer mapping and performance information obtained from simulations
├── Demos -- Folder containing demo videos of the tools
├── Module_AI_Map
├── AI_Networks -- Folder containing configuration files for AI network models
├── util_chip -- Folder containing code to map the AI model onto the device using HISIM's default mapping
├── Module_Compute -- Folder containing code for PPA evaluation of the IMC compute core
├── Module_Network -- Folder containing code for PPA evaluation of routers and the AIB interface
├── Module_Thermal -- Folder containing thermal simulation code for 2D, 2.5D, and 3D architectures
├── Results
├── result_thermal -- Folder containing thermal maps of the device obtained from simulations
├── PPA.csv -- CSV file containing PPA and thermal values in list format
├── PPA_new.csv -- CSV file containing PPA and thermal values in dictionary format
├── hisim_model.py -- Main file used by run_tb.py; implements setter functions
├── run_tb.py -- Run file to execute example runs using the HiSimModel defined in hisim_model.py
├── analy_model.py -- Main file used by run.py
├── run.py -- Run file to execute multiple runs of analy_model.py with different configurations
To facilitate extensibility in evaluating the performance of various AI algorithms, AI network files for different DNNs, GNNs, and transformers are provided at this link. The structure of the network.csv
file is as follows: IFM_size_x, IFM_size_y, N_IFM, Kx, Ky, NOFM, pool, layer-wise sparsity.
IFM_Size_x: Size of the input of the Layer in the x-dimension
IFM_Size_y: Size of the input of the Layer in the y-dimension
N_IFM: Number of input channels of the layer
Kx: Kernel size in x-dimension of the layer
Ky: Kernel size in y-dimension of the layer
NOFM: Number of output channels of the layer
pool: Parameter indicating if the layer is followed by pooling or not: 0 if not followed by pooling and 1 if followed by pooling
layer-wise sparsity: Total Sparsity of the layer
For a fully connected (FC) or linear layer, Kx
and Ky
are both set to 1. For an example, refer to the ViT network here.
To run example use-cases, run the following command
python run_tb.py
run the following command to install dependencies
pip install -r requirements.txt
- Python 3.8.5
- pandas 1.1.3
- numpy 1.19.2
- torch 2.2.2
- matplotlib 3.3.2
- scipy 1.5.2
Input parameters of HiSimModel and their corresponding setter functions and options are as follows:
Parameter Setter Function Parameter Options
--chip_architect set_chip_architecture Chip Architecture options are 2D chip (M2D), 3D chip (M3D), 2.5D chip (H2_5D), and 3.5D (M3_5D).
--xbar_size set_xbar_size RRAM crossbar sizes are 64, 128, 256, 512, and 1024
--N_tile set_N_tile Number of tiles per tier, with options 4, 9, 16, 25, 36, and 49.
--N_pe set_num_pe Number of processing elements (PEs) per tile, with options 4, 9, 16, 25, and 36.
--N_crossbar set_N_crossbar Number of crossbars per PE, with options 4, 9, and 16.
--quant_weight set_quant_weight Precision of quantized weights of the AI model.
--quant_act set_quant_act Precision of quantized activations of the AI model.
--freq_computing set_freq_computing Clock frequency of the compute core in GHz.
--fclk_noc set_fclk_noc Clock frequency of the network communication unit in GHz.
--tsvPitch set_tsv_pitch 3D TSV (Through-Silicon Via) pitch in micrometers (µm).
--N_tier set_N_tier Number of tiers in the chip for 3D architecture.
--volt set_volt Operating voltage in volts
--placement_method set_placement Placement method options:
1: Tier/Chiplet Edge to Tier/Chiplet Edge connection
5: Tile-to-Tile 3D connection
--routing_method set_router Routing method options:
1: Local routing—uses only nearby routers and TSVs.
2: Global routing—data will attempt to use all available routers to reach the next tier.
--percent_router set_percent_router Percentage of routers used for 3D communication when data is routed from one tier to the next.
--W2d set_W2d 2D NoC (Network on Chip) bandwidth.
--router_times_scale set_router_times_scale Scaling factor for time components of the router: trc, tva, tsa, tst, tl, tenq.
--ai_model set_ai_model AI models, including vit, gcn, resnet50, resnet110, vgg16, and densenet121.
--thermal set_thermal Set to True to run a thermal simulation; set to False otherwise.
--N_stack, set_N_stack Number of 3D stacks in a 3.5D design or number of chiplets in a 2.5D design.
PPA of a 2D architecture can be estimated using analy_model.py with the following command:
python analy_model.py --chip_architect M2D --N_stack 1 --N_tier 1 --N_tile <Input the total number of tiles required> --ai_model <Input AI model>
Example command: python analy_model.py --chip_architect M2D --N_stack 1 --N_tier 1 --N_tile 169 --N_pe 36 --xbar_size 1024 --ai_model vit
Include the --thermal flag in the command to run a thermal simulation. The time required to execute a thermal simulation can vary from approximately 3 to 20 minutes.
Alternatively, To run a 2D simulation, apply the following settings:
-- Use HiSimModel.set_chip_architecture("M2D") to set the chip architecture to 2D
-- Use HiSimModel.set_N_tier(1) and HiSimModel.set_N_stack(1) to set the number of tiers and stacks to 1
-- Set the remaining input parameters based on the required hardware configuration
-- AI models can be specified as one of the following: vit, gcn, resnet50, resnet110, vgg16, densenet121
-- Use HiSimModel.run_model() to evaluate performance
-- Check Results/PPA.csv for PPA information and Results/tile_map.png for tile mapping information
PPA of a 2.5D architecture can be estimated using analy_model.py with the following command:
python analy_model.py --chip_architect H2_5D --placement_method 1 --N_tile <Input the number of tiles per chiplet> --N_stack <Input the number of chiplets> --ai_model <Input ai model>
Example command: python analy_model.py --chip_architect H2_5D --placement_method 1 --N_stack 3 --N_tier 1 --N_tile 64 --N_pe 36 --xbar_size 1024 --ai_model vit
. Include the --thermal flag in the command to run a thermal simulation. The time required to execute a thermal simulation can vary from approximately 3 to 20 minutes.
Alternatively, To run a 2.5D simulation, apply the following settings:
-- Use HiSimModel.set_chip_architecture("H2_5D") to set the chip architecture to H2_5D
-- Use HiSimModel.set_N_tier(1) and HiSimModel.set_placement(1) to set the number of tiers and placement_method to 1
-- Set the remaining input parameters based on the required hardware configuration
-- AI models can be specified as one of the following: vit, gcn, resnet50, resnet110, vgg16, densenet121
-- Use HiSimModel.run_model() to evaluate performance
-- Check Results/PPA.csv for PPA information and Results/tile_map.png for tile mapping information
PPA of a 3D architecture can be estimated using analy_model.py with the following command:
python analy_model.py --chip_architect M3D --placement_method 5 --N_tier <Input the number of tiers to be tested for> --ai_model <Input ai model>
Example command: python analy_model.py --chip_architect M3D --placement_method 5 --N_stack 1 --N_tier 3 --N_tile 64 --N_pe 36 --xbar_size 1024 --ai_model vit
. Include the --thermal flag in the command to run a thermal simulation. The time required to execute a thermal simulation can vary from approximately 3 to 20 minutes.
Alternatively, To run a 3D simulation, apply the following settings:
-- Use HiSimModel.set_chip_architecture("M3D") to set the chip architecture to M3D
-- Use HiSimModel.set_N_stack(1) and HiSimModel.set_placement(5) to set the number of stacks to 1 and placement_method to 5
-- Set the remaining input parameters based on the required hardware configuration
-- AI models can be specified as one of the following: vit, gcn, resnet50, resnet110, vgg16, densenet121
-- Use HiSimModel.run_model() to evaluate performance
-- Check Results/PPA.csv for PPA information and Results/tile_map.png for tile mapping information
PPA of a 3.5D architecture can be estimated using analy_model.py with the following command:
python analy_model.py --chip_architect M3_5D --placement_method 5 --N_tier <Input the number of tiers to be tested for> --N_stack <Input the number of 3D Stacks> --ai_model <Input ai model>
Example command: python analy_model.py --chip_architect M3_5D --placement_method 5 --N_stack 2 --N_tier 2 --N_tile 64 --N_pe 36 --xbar_size 1024 --ai_model vit
. Thermal simulation for 3.5D is yet to be integrated into the codes.
Alternatively, To run a 3_5D simulation, apply the following settings:
-- Use HiSimModel.set_chip_architecture("M3_5D") to set the chip architecture to M3_5D
-- Use HiSimModel.set_placement(5) to set the placement_method to 5
-- Set the remaining input parameters based on the required hardware configuration
-- AI models can be specified as one of the following: vit, gcn, resnet50, resnet110, vgg16, densenet121
-- Use HiSimModel.run_model() to evaluate performance
-- Check Results/PPA.csv for PPA information and Results/tile_map.png for tile mapping information
The workflow of the codes is as follows: The AI model is first mapped onto the architecture using the default mapping in util_mapping.py located in the Module_AI_Map folder. This process outputs layer_information.csv in the following format:
layer index, Number of tiles required for the layer, Number of PEs required for the layer, Number of rows of PEs for the layer, Number of columns of PEs for the layer, Number of input cycles for the layer, pooling, Number of tiles mapped until this layer, Total number of input activations for the layer, Tier/chiplet index that the layer is mapped to for this layer, Cell Bit Utilization for the layer, Average Utilization of a row for the layer, Total number of weight bits for the layer, Average Utilization of a column for the layer, Number of FLOPS of the layer.
The performance of each layer is estimated based on the layer mapping, assuming an Analog IMC PE. The layer performance is output in the following format:
layer index, number of tiles required for this layer, latency of the layer, Energy of the layer, leakage energy of the layer, average power consumption of each tile for the layer
The performance of the network and interconnect is then estimated based on the number of tiles, placement method, and the percentage of routers. The thermal simulation is performed using power and area maps. It outputs the peak temperature, average temperature, and thermal maps. Lastly, All the results are stored in the output file PPA.csv.
The structure of output PPA.csv file is as follows:
freq_core (GHz),freq_noc (GHz),Xbar_size,N_tile,N_pe,N_tile(real),N_tier(real),N_stack(real),W2d,W3d,Computing_latency (ns),Computing_energy (pJ),compute_area (um2),chip area (mm2),chip_Architecture,2d NoC latency (ns),3d NoC latency (ns),2.5d NoC latency (ns),network_latency (ns),2d NoC energy (pJ),3d NoC energy (pJ),2.5d NoC energy (pJ),network_energy (pJ),rcc (compute latency/communciation latency),Throughput(TFLOPS/s),2D_3D_NoC_power (W),2_5D_power (W),2d_3d_router_area (mm2),placement_method,percent_router,peak_temperature (C),thermal simulation time (s),networking simulation time (s),computing simulation time (s),total simulation time (s)
The parameters freq_core (GHz),freq_noc (GHz),Xbar_size,N_tile,N_pe,N_tile(real),N_tier(real),N_stack(real),W2d,placement_method,percent_router are input parameters of the simulation performed
Outputs from the simulation:
W3d: 3D TSV Bandwidth
N_tile(real): Number of real tiles mapped in a tier
Computing_latency(ns): Total Latency of the computing core
Computing_energy(pJ): Total Energy of the computing core
compute_area(um2): Total Area of the computing core
chip_area(mm2): Total Chip Area
2d NoC latency(ns): Total Latency of the 2D NoC
3d NoC latency(ns): Total Latency of the 3D TSV
2.5d NoC latency(ns): Total Latency of the 2.5D AIB interface
network_latency(ns): Total Network Latency consisting of 2D NoC, 3D TSV, 2.5D AIB interface latencies
2d NoC energy(pJ): Total Energy of the 2D NoC
3d NoC energy(pJ): Total Energy of the 3D TSV
2.5d NoC energy(pJ): Total Energy of the 2.5D AIB interface
network_energy(pJ): Total Network Energy consisting of 2D NoC, 3D TSV, 2.5D AIB interface energies
rcc: Ratio between computation and communication latencies
hroughput(TFLOPS/s) Total number of FLOPS divided by latency in TFLOPS/s
2D_3D_NoC_power(W): Total power of the 2D NoC and 3D TSV
2_5D_power(W): Total power of the 2.5D AIB interface
2d_3d_router_area(mm2):Total area of the 2D and 3D router
peak_temperature (K): Peak temperature of the chip
The Tile Map is a visual representation of the default mapping implemented in HISIM. It displays the stacks, tiers, and tiles present in the architecture, and it links each tile to the AI layer number to which it is mapped. Note that multiple tiles can be mapped to a single AI layer. This tile map can be found at Results/tile_map.png
.
The Thermal Map is a visual representation of the temperature distribution across the components of each tier. It shows the temperature profile of tiles and routers on each tier, including the 2.5D connections. These temperature maps can be found in the folder Results/result_thermal/1stacks
for 3D runs and in folder Results/result_thermal/
for 2.5 runs.
A demo video has been added to the repository to help users get started, showcasing a few examples using run_tb.py. The test cases, their respective outputs, AI networks, hardware configuration, and DSE parameters inside run_tb.py are as follows:
Test Case Output AI Network HW configuration DSE parameter
(Xbar-Npe-Ntile-Ntier-Nstack-arch)
Test Case 1 PPA ViT 1024-9-100-2-2-3.5D NA - Single run
Test Case 2 PPA densenet121 1024-36-64-2-2-3.5D NA - Single run
Test Case 3 PPA densenet121 1024-36-81-2-1-3D tsv_pitch: [2,3,4,5,10,20]
Test Case 4 PPA densenet121 1024-36-81-2-1-3D noc_width(W2d): [i for i in range(1,32, 5)]
Test Case 5 PPA, thermal densenet121 1024-36-169-varies-1-3D N_tier: [i for i in range(4)]
Test Case 6 PPA, thermal densenet121 1024-36-81-2-1-3D NA - Single run
Test Case 7 PPA, thermal ViT 1024-9-169-2-1-3D NA - Single run
Test Case 8 PPA, thermal densenet121 1024-36-81-1-2-2.5D NA - Single run
Alternatively, the demo video located at Demos/demo-05172024.mp4 demonstrates examples using run.py. Each of the required parameters for the design space can be configured as an array. To include thermal simulations in the design space exploration, add the --thermal flag to the Python command for the run.py file.
If you found this tool useful, please use the following bibtex to cite us
@INPROCEEDINGS{10396377,
author={Wang, Zhenyu and Sun, Jingbo and Goksoy, Alper and Mandal, Sumit K. and Seo, Jae-Sun and Chakrabarti, Chaitali and Ogras, Umit Y. and Chhabria, Vidya and Cao, Yu},
booktitle={2023 IEEE 15th International Conference on ASIC (ASICON)},
title={Benchmarking Heterogeneous Integration with 2.5D/3D Interconnect Modeling},
year={2023},
volume={},
number={},
pages={1-4},
keywords={Analytical models;Three-dimensional displays;Computational modeling;Multichip modules;Benchmark testing;Data models;Artificial intelligence;Heterogeneous Integration;2.5D;3D;Chiplet;ML accelerators;Electro-thermal Co-design},
doi={10.1109/ASICON58565.2023.10396377}}
@INPROCEEDINGS{10473875,
author={Wang, Zhenyu and Sun, Jingbo and Goksoy, Alper and Mandal, Sumit K. and Liu, Yaotian and Seo, Jae-Sun and Chakrabarti, Chaitali and Ogras, Umit Y. and Chhabria, Vidya and Zhang, Jeff and Cao, Yu},
booktitle={2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)},
title={Exploiting 2.5D/3D Heterogeneous Integration for AI Computing},
year={2024},
volume={},
number={},
pages={758-764},
keywords={Analytical models;Three-dimensional displays;Computational modeling;Wires;Multichip modules;Benchmark testing;Transformers;Heterogeneous Integration;2.5D;3D;Chiplet;ML accelerators;Performance Analysis},
doi={10.1109/ASP-DAC58780.2024.10473875}}
Main devs:
- Zhenyu Wang
- Pragnya Sudershan Nalla
- Jingbo Sun
- Emad Haque
Contributers:
- A. Alper Goksoy
Maintainers and Advisors
- Sumit K.Mandal
- Jae-sun Seo
- Vidya A. Chhabria
- Jeff Zhang
- Chaitali Chakrabarti
- Umit Y. Ogras
- Yu Cao