This is the official code repository for the paper "Achieving Sparse Activation in Small Language Models". We aim to achieve sparse activation in SLMs. We demonstrated that the existing magnitude-based sparse activation cannot be applied to SLMs, and using gradient-based attribution scores for sparse activation is a better choice. By applying a corrective term onto the existing GxO attribution metric, our approach can achieve 80% sparsification ratio on SLMs with <5% accuracy loss.
Install all the required packages.
pip install -r requirements.txt
We use Phi-2 and the TruthfulQA dataset to demonstrate an example of sparse activation. The following steps can be used to generate accuracy-sparsity trade-off curves based on various metrics, including the proposed Corrected GxO metric.
Create the folder for generated results as following code
python3 folder_creation.py
Generate the labels for sparse activation
python3 label_generation.py
we can use the following code to generate the output magnitude of each attention head and MLP neurons
python3 Mag_attention.py
python3 Mag_mlp.py
we can use the following code to generate the various attribution-based scores of each attention head and MLP neurons
python3 attribution_attention.py --metric gradient
python3 attribution_attention.py --metric gxo
Integrated gradients (IG) with 20 interpolations (The number of interpolations can be any positive integer)
python3 attribution_attention.py --metric ig --n_steps 20
Apply sparse activation based on output magnitude and different attribution scores and plot the accuracy-sparsity trade-off curves
python3 main.py --metric_name magnitude
python3 main.py --metric_name gradient
python3 main.py --metric_name gxo
python3 main.py --metric_name snip
python3 main.py --metric_name cor_gxo
We can then check the accuracy-sparsity trade-off curves in the path: result/truthfulqa/res/both
.
@article{song2024achieving,
title={Achieving Sparse Activation in Small Language Models},
author={Song, Jifeng and Huang, Kai and Yin, Xiangyu and Yang, Boyuan and Gao, Wei},
journal={arXiv e-prints},
pages={arXiv--2406},
year={2024}
}