ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference (AAAI 2024)
Code for the paper titled "ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference" [AAAI 2024 Main Track]
Due to this is a new area about Large Language Model's inference accerleration, we are open to any advice for improving our work.
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Ziqian Zeng
* equal contribution
- We propose ConsistentEE, an early exiting method that can achieve consistency during training and inference by formulating the early exiting problem as a reinforcement learning problem.
- We propose a concept named Memorized Layer to measure the hardness of an instance. We incorporate it into the reward function to allow an instance to balance the accuracy and acceleration - depending on individual hardness.
- The experimental results show that our method can outperform other baselines on natural language understanding and generation tasks.
Install the necessary packages with:
$ pip install -r requirements.txt
On encode-only models, we experimented with six tasks in GLUE, the MCID task and StackOverflow task.
On decode-only models, we experimented with Alpaca/Dolly dataset and CNN/DM dataset.
If you find this repo useful for your research, please consider citing our paper:
@misc{zeng2023consistentee,
title={ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference},
author={Ziqian Zeng and Yihuai Hong and Hongliang Dai and Huiping Zhuang and Cen Chen},
year={2023},
eprint={2312.11882},
archivePrefix={arXiv},
primaryClass={cs.CL}
}