/Awesome-Efficient-PLM

Must-read papers on improving efficiency for pre-trained language models.

Awesome Efficient PLM Papers

Must-read papers on improving efficiency for pre-trained language models.

The paper list is mainly mantained by Lei Li.

Knowledge Distillation

  1. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter NeurIPS workshop

    Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf [pdf] [project]

  2. Patient Knowledge Distillation for BERT Model Compression EMNLP 2019

    Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu [pdf] [project]

  3. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models Preprint

    Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova [pdf] [project]

  4. TinyBERT: Distilling BERT for Natural Language Understanding Findings of EMNLP 2020

    Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu [pdf] [project]

  5. DynaBERT: Dynamic BERT with Adaptive Width and Depth NeurIPS 2020

    Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu [pdf] [project]

  6. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing EMNLP 2020

    Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou [pdf] [project]

  7. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers NeurIPS 2020

    Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou [pdf] [project]

  8. BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance EMNLP 2020

    Jianquan Li, Xiaokang Liu, Honghong Zhao, Ruifeng Xu, Min Yang, Yaohong Jin [pdf] [project]

  9. MixKD: Towards Efficient Distillation of Large-scale Language Models ICLR 2021

    Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin [pdf]

  10. Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains ACL-IJCNLP 2021

    Haojie Pan, Chengyu Wang, Minghui Qiu, Yichang Zhang, Yaliang Li, Jun Huang [pdf]

  11. MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation ACL-IJCNLP 2021

    Ahmad Rashid, Vasileios Lioutas, Mehdi Rezagholizadeh [pdf]

  12. Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor ACL-IJCNLP 2021

    Xinyu Wang, Yong Jiang, Zhaohui Yan, Zixia Jia, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu [pdf] [project]

  13. Weight Distillation: Transferring the Knowledge in Neural Network Parameters ACL-IJCNLP 2021

    Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu [pdf]

  14. Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation ACL-IJCNLP 2021

    Yuanxin Liu, Fandong Meng, Zheng Lin, Weiping Wang, Jie Zhou [pdf]

  15. MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers Findings of ACL-IJCNLP 2021

    Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei [pdf] [project]

  16. One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers Findings of ACL-IJCNLP 2021

    Chuhan Wu, Fangzhao Wu, Yongfeng Huang [pdf]

  17. Dynamic Knowledge Distllation for Pre-trained Language Models EMNLP 2021

    Lei Li, Yankai Lin, Shuhuai Ren, Peng Li, Jie Zhou, Xu Sun [pdf] [project]

  18. EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation Findings of EMNLP 2021

    Chenhe Dong, Guangrun Wang, Hang Xu, Jiefeng Peng, Xiaozhe Ren, Xiaodan Liang [pdf] [project]

  19. Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression EMNLP 2021

    Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei [pdf] [project]

  20. How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding Findings of EMNLP 2021

    Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh [pdf]

  21. Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers EMNLP 2020

    Yimeng Wu, Peyman Passban, Mehdi Rezagholizadeh, Qun Liu [pdf] [project]

  22. ALP-KD: Attention-Based Layer Projection for Knowledge Distillation AAAI 2021

    Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu [pdf]

  23. Universal-KD: Attention-based Output-Grounded Intermediate Layer Knowledge Distillation EMNLP 2021

    Yimeng Wu, Mehdi Rezagholizadeh, Abbas Ghaddar, Md Akmal Haidar, Ali Ghodsi [pdf]

Dynamic Early Exiting

  1. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference ACL 2020

    Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, Jimmy Lin [pdf] [project]

  2. FastBERT: a Self-distilling BERT with Adaptive Inference Time ACL 2020

    Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju [pdf] [project]

  3. The Right Tool for the Job: Matching Model and Instance Complexities ACL 2020

    Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith [pdf] [project]

  4. A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models NAACL 2021

    Kaiyuan Liao, Yi Zhang, Xuancheng Ren, Qi Su, Xu Sun, Bin He [pdf] [project]

  5. CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade Preprint

    Lei Li, Yankai Lin, Deli Chen, Shuhuai Ren, Peng Li, Jie Zhou, Xu Sun [pdf] [project]

  6. Early Exiting BERT for Efficient Document Ranking SustaiNLP 2020

    Ji Xin, Rodrigo Nogueira, Yaoliang Yu, and Jimmy Lin [pdf] [project]

  7. BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression EACL 2021

    Ji Xin, Raphael Tang, Yaoliang Yu, and Jimmy Lin [pdf] [project]

  8. Accelerating BERT Inference for Sequence Labeling via Early-Exit ACL 2021

    Xiaonan Li, Yunfan Shao, Tianxiang Sun, Hang Yan, Xipeng Qiu, Xuanjing Huang [pdf] [project]

  9. BERT Loses Patience: Fast and Robust Inference with Early Exit NeurIPS 2020

    Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei [pdf] [project]

  10. Early Exiting with Ensemble Internal Classifiers Preprint

    Tianxiang Sun, Yunhua Zhou, Xiangyang Liu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu [pdf]

  11. LeeBERT: Learned Early Exit for BERT with Cross-Level Optimization ACL 2021

    Wei Zhu [pdf]

  12. Consistent Accelerated Inference via Confident Adaptive Transformers EMNLP 2021

    Tal Schuster, Adam Fisch, Tommi Jaakkola, Regina Barzilay [pdf] [project]

  13. Towards Efficient NLP: A Standard Evaluation and A Strong Baseline NAACL 2022

    Xiangyang Liu, Tianxiang Sun, Junliang He, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu [pdf] [project]

  14. A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation Findings of ACL 2022

    Tianxiang Sun, Xiangyang Liu, Wei Zhu, Zhichao Geng, Lingling Wu, Yilong He, Yuan Ni, Guotong Xie, Xuanjing Huang, Xipeng Qiu [pdf]

  15. SkipBERT: Efficient Inference with Shallow Layer Skipping ACL 2022

    Jue Wang, Ke Chen, Gang Chen, Lidan Shou, Julian McAuley [pdf] [project]

Quantization

  1. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT AAAI 2020

    Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer [pdf] [project]

  2. TernaryBERT: Distillation-aware Ultra-low Bit BERT EMNLP 2020

    Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu [pdf] [project]

  3. Q8BERT: Quantized 8Bit BERT NeurIPS 2019 Workshop

    Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat [pdf] [project]

  4. BinaryBERT: Pushing the Limit of BERT Quantization EMNLP 2020

    Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun Liu, Michael Lyu, Irwin King [pdf] [project]

  5. Automatic Mixed-Precision Quantization Search of BERT IJCAI 2021

    Changsheng Zhao, Ting Hua, Yilin Shen, Qian Lou, Hongxia Jin [pdf]

  6. I-BERT: Integer-only BERT Quantization ICML 2021

    Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer [pdf] [project]

  7. Training with Quatization Noise for Extreme Model CompressionN ICLR 2021

    Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval Herve Jegou, Armand Joulin [pdf] [project]

  8. Compression of Generative Pre-trained Language Models via Quantization ACL 2022

    Chaofan Tao, Lu Hou, Wei Zhang, Lifeng Shang, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong [pdf]

  9. BiBERT: Accurate Fully Binarized BERT ICLR 2022

    Haotong Qin, Yifu Ding, Mingyuan Zhang, Qinghua YAN, Aishan Liu, Qingqing Dang, Ziwei Liu, Xianglong Liu [pdf]

Pruning

  1. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned ACL 2019

    Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov [pdf] [project]

  2. Are Sixteen Heads Really Better than One? NeurIPS 2019

    Paul Michel, Omer Levy, Graham Neubig [pdf] [project]

  3. The Lottery Ticket Hypothesis for Pre-trained BERT Networks NeurIPS 2020

    Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, Michael Carbin [pdf] [project]

  4. Movement Pruning: Adaptive Sparsity by Fine-Tuning NeurIPS 2020

    Victor Sanh, Thomas Wolf, Alexander M. Rush [pdf] [project]

  5. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning Rep4NLP 2020

    Mitchell A. Gordon, Kevin Duh, Nicholas Andrews [pdf] [project]

  6. Reducing Transformer Depth on Demand with Structured Dropout Preprint

    Angela Fan, Edouard Grave, Armand Joulin [pdf]

  7. When BERT Plays the Lottery, All Tickets Are Winning EMNLP 2020

    Sai Prasanna, Anna Rogers, Anna Rumshisky [pdf] [project]

  8. Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior Findings of EMNLP 2020

    Zi Lin, Jeremiah Liu, Zi Yang, Nan Hua, Dan Roth [pdf]

  9. Structured Pruning of a BERT-based Question Answering Model Preprint

    J.S. McCarley, Rishav Chakravarti, Avirup Sil [pdf]

  10. Structured Pruning of Large Language Models EMNLP 2020

    Ziheng Wang, Jeremy Wohlwend, Tao Lei [pdf] [project]

  11. Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm NAACL 2021

    Dongkuan Xu, Ian E.H. Yen, Jinxi Zhao, Zhibin Xiao [pdf]

  12. Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization ACL 2021

    Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen [pdf] [project]

  13. Block Pruning For Faster Transformers EMNLP 2021

    François Lagunas, Ella Charlaix, Victor Sanh, Alexander M. Rush [pdf] [project]

  14. A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models NeurIPS 2022

    Yuanxin Liu, Fandong Meng, Zheng Lin, Jiangnan Li, Peng Fu, Yanan Cao, Weiping Wang, Jie Zhou [pdf]

Other Methods

  1. Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection arxiv 2019

    Guangxiang Zhao, Junyang Lin, Zhiyuan Zhang, Xuancheng Ren, Qi Su, Xu Sun [pdf] [project]

  2. Compressing Pre-trained Language Models by Matrix Decomposition AACL 2020

    Matan Ben Noach, Yoav Goldberg [pdf]

  3. LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression COLING 2020

    Yihuan Mao, Yujing Wang, Chufan Wu, Chen Zhang, Yang Wang, Yaming Yang, Quanlu Zhang, Yunhai Tong, Jing Bai [pdf]

  4. Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators ACL 2021

    Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Zhi-Yuan Xie, Zhong-Yi Lu and Ji-Rong Wen [pdf] [project]

  5. Exploring Extreme Parameter Compression for Pre-trained Language Models ICLR 2022

    Benyou Wang, Yuxin Ren, Lifeng Shang, Xin Jiang, Qun Liu [pdf] [project]

  6. From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models Findings of EMNLP 2022

    Lei Li, Yankai Lin, Xuancheng Ren, Guangxiang Zhao, Peng Li, Jie Zhou, Xu Sun [pdf] [project]

Contribution

If you find any related work not included in the list, do not hesitate to raise a PR to help us complete the list.