mydistill

通用蒸馏技术

2022,阿里,Knowledge Distillation of Transformer-based Language Models Revisited

2023,Meta,Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

2021,山东大学,Bert,Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

2021,清华,Bert,One Teacher is Enough?Pre-trained Language Model Distillation from Multiple Teachers

LLM蒸馏技术

2023,微软,Knowledge Distillation of Large Language Models

2023,谷歌,Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

2020,Huggingface,DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

2023,ICML,Less is More: Task-aware Layer-wise Distillation for Language Model Compression

未过滤:

MIXKD: TOWARDS EFFICIENT DISTILLATION OF LARGE-SCALE LANGUAGE MODELS

Dynamic Knowledge Distillation for Pre-trained Language Models

Distilling Linguistic Context for Language Model Compression

DISTILLM: Towards Streamlined Distillation for Large Language Models

Causal Distillation for Language Models

A Survey on Knowledge Distillation of Large Language Models

Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains

Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing

DISCO: Distilling Counterfactuals with Large Language Models

Knowledge Distillation of Large Language Models

Large Language Model Meets Graph Neural Network in Knowledge Distillation

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Multi-Granularity Structural Knowledge Distillation for Language Model Compression

How To Train Your (Compressed) Large Language Model

GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Lion: Adversarial Distillation of Proprietary Large Language Models

Distilling Event Sequence Knowledge From Large Language Models

CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning

Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination

Cost-effective Distillation of Large Language Models