AccelTran is a tool to simulate a design space of accelerators on diverse flexible and heterogeneous transformer architectures supported by the FlexiBERT 2.0 framework at jha-lab/txf_design-space.
The figure below shows the utilization of different modules in an AccelTran architecture for the BERT-Tiny transformer model.
git clone https://github.com/JHA-Lab/acceltran.git
cd ./acceltran/
git submodule init
git submodule update
The python environment setup is based on conda. The script below creates a new environment named txf_design-space
:
source env_setup.sh
For pip
installation, we are creating a requirements.txt
file. Stay tuned!
Synthesis scripts use Synopsys Design Compiler. All hardware modules are implemented in SystemVerilog in the directory synthesis/top.
To get area and power consumption reports for each module, use the following command:
cd ./synthesis/
dc_shell -f 14nm_sg.tcl -x "set top_module <M>"
cd ..
Here, <M>
is the module that is to be synthesized in: mac_lane
, ln_forward_<T>
(for layer normalization), softmax_<T>
, etc. where <T>
is the tile size among 8, 16, or 32.
All output resports are stored in synthesis/reports.
To run the synthesis for the DMA module, run the following command instead:
cd ./synthesis/
dc_shell -f dma.tcl
To get the sparsity in activations and weights in an input transformer model and its corresponding performance on the GLUE benchmark, use the dynamic pruning model: DP-BERT.
To test the effect of different sparsity ratios on the model performance on the SST-2 benchmark, use the following script:
cd ./pruning/
python3 run_evaluation.py --task sst2 --max_pruning_threshold 0.1
cd ..
The script uses a weight-pruned model, and so, the weights are not pruned futher. To prune the weights with a pruning_threshold
as well, use the flag: --prune_weights
.
AccelTran supports a diverse range of accelerator hyperparameters. It also supports all ~1088 models in the FlexiBERT 2.0 design space.
To specify the configuration of an accelerator's architecture, use a configuration file in simulator/config directory. Example configuration files are given accelerators optimized for BERT-Nano and BERT-Tiny. Accelerator hardware configuration files should conform with the design space specified in the simulator/design_space/design_space.yaml file.
To specify the transformer model parameters, use a model dictionary file in simulator/model_dicts. Model dictionaries for BERT-Nano and BERT-Tiny have already been provided for convenience.
To run AccelTran on the BERT-Tiny model, while plotting utilization and metric curves every 1000 cycles, use the following command:
cd ./simulator/
python3 run_simulator.py --model_dict_path ./model_dicts/bert_tiny.json --config_path ./config/config_tiny.yaml --plot_steps 1000 --debug
cd ..
This will output the accelerator state for every cycle. For more information on the possible inputs to the simulation script, use:
cd ./simulator/
python3 run_simulator.py --help
cd ..
Shikhar Tuli. For any questions, comments or suggestions, please reach me at stuli@princeton.edu.
Cite our work using the following bitex entry:
@article{tuli2023acceltran,
title={{AccelTran}: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers},
author={Tuli, Shikhar and Jha, Niraj K},
journal={arXiv preprint arXiv:2302.14705},
year={2023}
}
If you use the AccelTran design space to implement transformer-accelerator co-design, please also cite:
@article{tuli2023transcode,
title={{TransCODE}: Co-design of Transformers and Accelerators for Efficient Training and Inference},
author={Tuli, Shikhar and Jha, Niraj K},
journal={arXiv preprint arXiv:2303.14882},
year={2023}
}
BSD-3-Clause. Copyright (c) 2022, Shikhar Tuli and Jha Lab. All rights reserved.
See License file for more details.