2022,阿里,Knowledge Distillation of Transformer-based Language Models Revisited
2023,Meta,Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
2021,山东大学,Bert,Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains
2021,清华,Bert,One Teacher is Enough?Pre-trained Language Model Distillation from Multiple Teachers
2023,微软,Knowledge Distillation of Large Language Models
2023,谷歌,Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
2020,Huggingface,DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
2023,ICML,Less is More: Task-aware Layer-wise Distillation for Language Model Compression
MIXKD: TOWARDS EFFICIENT DISTILLATION OF LARGE-SCALE LANGUAGE MODELS
Dynamic Knowledge Distillation for Pre-trained Language Models
Distilling Linguistic Context for Language Model Compression
DISTILLM: Towards Streamlined Distillation for Large Language Models
Causal Distillation for Language Models
A Survey on Knowledge Distillation of Large Language Models
Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains
Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing
DISCO: Distilling Counterfactuals with Large Language Models
Knowledge Distillation of Large Language Models
Large Language Model Meets Graph Neural Network in Knowledge Distillation
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Multi-Granularity Structural Knowledge Distillation for Language Model Compression
How To Train Your (Compressed) Large Language Model
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Lion: Adversarial Distillation of Proprietary Large Language Models
Distilling Event Sequence Knowledge From Large Language Models
CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning
Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination
Cost-effective Distillation of Large Language Models