NLP_and_Multimodal_Distillation

NLP

Pre-training

(1) MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. [link]

(2) Contrastive Distillation on Intermediate Representations for Language Model Compression. [link]

(3) DisCo: Effective Knowledge Distillation For Contrastive Learning of Sentence Embeddings. [link]

Text Retrieval

(1) Adversarial Retriever-Ranker for dense text retrieval. [link]

(2) LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval. [link]

(3) EMBEDDISTILL: A Geometric Knowledge Distillation For Information Retrieval. [link]

Neural Ranking

(1) In defense of dual-encoders for neural ranking. [link]

Multimodal Learning

(1) Masked Autoencoders Enable Efficient Knowledge Distillers. [link]

(2) Relational Knowledge Distillation. [link]

(3) MiniVLM: A Smaller and Faster Vision-Language Model. [link]

(4) ADVL: Adaptive Distillation For Vision-Language Task. [link]

(5) The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation. [link]

(6) Compressing Visual-linguistic Model via Knowledge Distillation. [link]

(7) Distilled Dual-Encoder Model for Vision-Language Understanding. [link]

(8) XDBERT: Distilling Visual Information to BERT from Cross-Modahttps://arxiv.org/pdf/2203.00048.pdfl Systems to Improve Language Understanding. [link]

(9) Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks. [link]

(10) Multi-modal Alignment using Representation Codebook. [link]