/ConsistentEE

[AAAI 2024 Main Track] Repository for the paper titled "ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference"

Primary LanguagePython

ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference (AAAI 2024)

Code for the paper titled "ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference" [AAAI 2024 Main Track]

Due to this is a new area about Large Language Model's inference accerleration, we are open to any advice for improving our work.

ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Ziqian Zeng$^*$, Yihuai Hong$^*$,Hongliang Dai, Huiping Zhuang, Cen Chen
* equal contribution

  • We propose ConsistentEE, an early exiting method that can achieve consistency during training and inference by formulating the early exiting problem as a reinforcement learning problem.
  • We propose a concept named Memorized Layer to measure the hardness of an instance. We incorporate it into the reward function to allow an instance to balance the accuracy and acceleration - depending on individual hardness.
  • The experimental results show that our method can outperform other baselines on natural language understanding and generation tasks.

Requirements

Install the necessary packages with:

$ pip install -r requirements.txt

Experiments

On encode-only models, we experimented with six tasks in GLUE, the MCID task and StackOverflow task.
On decode-only models, we experimented with Alpaca/Dolly dataset and CNN/DM dataset.

BibTeX

If you find this repo useful for your research, please consider citing our paper:

@misc{zeng2023consistentee,
      title={ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference}, 
      author={Ziqian Zeng and Yihuai Hong and Hongliang Dai and Huiping Zhuang and Cen Chen},
      year={2023},
      eprint={2312.11882},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}