mydistill

通用蒸馏技术

2022，阿里，Knowledge Distillation of Transformer-based Language Models Revisited

2023，Meta，Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

2021，山东大学，Bert，Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

2021，清华，Bert，One Teacher is Enough?Pre-trained Language Model Distillation from Multiple Teachers

2023，微软，Knowledge Distillation of Large Language Models

2023，谷歌，Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

2020，Huggingface，DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

2023，ICML，Less is More: Task-aware Layer-wise Distillation for Language Model Compression

MIXKD: TOWARDS EFFICIENT DISTILLATION OF LARGE-SCALE LANGUAGE MODELS

Dynamic Knowledge Distillation for Pre-trained Language Models

Distilling Linguistic Context for Language Model Compression

DISTILLM: Towards Streamlined Distillation for Large Language Models

Causal Distillation for Language Models

A Survey on Knowledge Distillation of Large Language Models

Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains

Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing

DISCO: Distilling Counterfactuals with Large Language Models

Knowledge Distillation of Large Language Models

Large Language Model Meets Graph Neural Network in Knowledge Distillation

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Multi-Granularity Structural Knowledge Distillation for Language Model Compression

How To Train Your (Compressed) Large Language Model

GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Lion: Adversarial Distillation of Proprietary Large Language Models

Distilling Event Sequence Knowledge From Large Language Models

CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning

Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination

Cost-effective Distillation of Large Language Models