- Awesome Knowledge-Distillation
- Distilling the knowledge in a neural network. Hinton et al. arXiv:1503.02531
- Learning from Noisy Labels with Distillation. Li, Yuncheng et al. ICCV 2017
- Training Deep Neural Networks in Generations:A More Tolerant Teacher Educates Better Students. arXiv:1805.05551
- Knowledge distillation by on-the-fly native ensemble. Lan, Xu et al. NIPS 2018
- Learning Metrics from Teachers: Compact Networks for Image Embedding. Yu, Lu et al. CVPR 2019
- Relational Knowledge Distillation. Park, Wonpyo et al, CVPR 2019
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. Huang, Zehao and Wang, Naiyan. 2017
- On Knowledge Distillation from Complex Networks for Response Prediction. Arora, Siddhartha et al. NAACL 2019
- On the Efficacy of Knowledge Distillation. Cho, Jang Hyun and Hariharan, Bharath. arXiv:1910.01348. ICCV 2019
- Revisit Knowledge Distillation: a Teacher-free Framework(Revisiting Knowledge Distillation via Label Smoothing Regularization). Yuan, Li et al. CVPR 2020 [code]
- Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher. Mirzadeh et al. arXiv:1902.03393
- Ensemble Distribution Distillation. ICLR 2020
- Noisy Collaboration in Knowledge Distillation. ICLR 2020
- On Compressing U-net Using Knowledge Distillation. arXiv:1812.00249
- Distillation-Based Training for Multi-Exit Architectures. Phuong, Mary and Lampert, Christoph H. ICCV 2019
- Self-training with Noisy Student improves ImageNet classification. Xie, Qizhe et al.(Google) CVPR 2020
- Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework. arXiv:1910.12061
- Preparing Lessons: Improve Knowledge Distillation with Better Supervision. arXiv:1911.07471
- Adaptive Regularization of Labels. arXiv:1908.05474
- Positive-Unlabeled Compression on the Cloud. Xu, Yixing(HUAWEI) et al. NIPS 2019
- Snapshot Distillation: Teacher-Student Optimization in One Generation. Yang, Chenglin et al. CVPR 2019
- QUEST: Quantized embedding space for transferring knowledge. Jain, Himalaya et al. CVPR 2020(pre)
- Conditional teacher-student learning. Z. Meng et al. ICASSP 2019
- Subclass Distillation. Müller, Rafael et al. arXiv:2002.03936
- MarginDistillation: distillation for margin-based softmax. Svitov, David & Alyamkin, Sergey. arXiv:2003.02586
- An Embarrassingly Simple Approach for Knowledge Distillation. Gao, Mengya et al. MLR 2018
- Sequence-Level Knowledge Distillation. Kim, Yoon & Rush, Alexander M. arXiv:1606.07947
- Boosting Self-Supervised Learning via Knowledge Transfer. Noroozi, Mehdi et al. CVPR 2018
- Meta Pseudo Labels. Pham, Hieu et al. ICML 2020
- Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model. CVPR 2020 [code]
- Distilled Binary Neural Network for Monaural Speech Separation. Chen Xiuyi et al. IJCNN 2018
- Teacher-Class Network: A Neural Network Compression Mechanism. Malik et al. arXiv:2004.03281
- Deeply-supervised knowledge synergy. Sun, Dawei et al. CVPR 2019
- What it Thinks is Important is Important: Robustness Transfers through Input Gradients. Chan, Alvin et al. CVPR 2020
- Triplet Loss for Knowledge Distillation. Oki, Hideki et al. IJCNN 2020
- Role-Wise Data Augmentation for Knowledge Distillation. ICLR 2020 [code]
- Distilling Spikes: Knowledge Distillation in Spiking Neural Networks. arXiv:2005.00288
- Improved Noisy Student Training for Automatic Speech Recognition. Park et al.arXiv:2005.09629
- Learning from a Lightweight Teacher for Efficient Knowledge Distillation. Yuang Liu et al. arXiv:2005.09163
- ResKD: Residual-Guided Knowledge Distillation. Li, Xuewei et al. arXiv:2006.04719
- Distilling Effective Supervision from Severe Label Noise. Zhang, Zizhao. et al. CVPR 2020 [code]
- Knowledge Distillation Meets Self-Supervision. Xu, Guodong et al. arXiv:2006.07114 [code]
- Self-supervised Knowledge Distillation for Few-shot Learning. arXiv:2006.09785 [code]
- Fitnets: Hints for thin deep nets. Romero, Adriana et al. arXiv:1412.6550
- Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. Zagoruyko et al. ICLR 2017
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks. Zhang, Zhi et al. arXiv:1710.09505
- A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. Yim, Junho et al. CVPR 2017
- Paraphrasing complex network: Network compression via factor transfer. Kim, Jangho et al. NIPS 2018
- Knowledge transfer with jacobian matching. ICML 2018
- Self-supervised knowledge distillation using singular value decomposition. Lee, Seung Hyun et al. ECCV 2018
- Learning Deep Representations with Probabilistic Knowledge Transfer. Passalis et al. ECCV 2018
- Variational Information Distillation for Knowledge Transfer. Ahn, Sungsoo et al. CVPR 2019
- Knowledge Distillation via Instance Relationship Graph. Liu, Yufan et al. CVPR 2019
- Knowledge Distillation via Route Constrained Optimization. Jin, Xiao et al. ICCV 2019
- Similarity-Preserving Knowledge Distillation. Tung, Frederick, and Mori Greg. ICCV 2019
- MEAL: Multi-Model Ensemble via Adversarial Learning. Shen,Zhiqiang, He,Zhankui, and Xue Xiangyang. AAAI 2019
- A Comprehensive Overhaul of Feature Distillation. Heo, Byeongho et al. ICCV 2019
- Feature-map-level Online Adversarial Knowledge Distillation. ICML 2020
- Distilling Object Detectors with Fine-grained Feature Imitation. ICLR 2020
- Knowledge Squeezed Adversarial Network Compression. Changyong, Shu et al. AAAI 2020
- Stagewise Knowledge Distillation. Kulkarni, Akshay et al. arXiv: 1911.06786
- Knowledge Distillation from Internal Representations. AAAI 2020
- Knowledge Flow:Improve Upon Your Teachers. ICLR 2019
- LIT: Learned Intermediate Representation Training for Model Compression. ICML 2019
- Improving the Adversarial Robustness of Transfer Learning via Noisy Feature Distillation. Chin, Ting-wu et al. arXiv:2002.02998
- Knapsack Pruning with Inner Distillation. Aflalo, Yonathan et al. arXiv:2002.08258
- Residual Knowledge Distillation. Gao, Mengya et al. arXiv:2002.09168
- Knowledge distillation via adaptive instance normalization. Yang, Jing et al. arXiv:2003.04289
- Bert-of-Theseus: Compressing bert by progressive module replacing. Xu, Canwen et al. arXiv:2002.02925 [code]
- Distilling Spikes: Knowledge Distillation in Spiking Neural Networks. arXiv:2005.00727
- Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks. Meet et al. arXiv:2005.08110
- Feature-map-level Online Adversarial Knowledge Distillation. Chung, Inseop et al. ICML 2020
- Channel Distillation: Channel-Wise Attention for Knowledge Distillation. Zhou, Zaida et al. arXiv:2006.01683 [code]
- Graph-based Knowledge Distillation by Multi-head Attention Network. Lee, Seunghyun and Song, Byung. Cheol arXiv:1907.02226
- Graph Representation Learning via Multi-task Knowledge Distillation. arXiv:1911.05700
- Deep geometric knowledge distillation with graphs. arXiv:1911.03080
- Better and faster: Knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification. IJCAI 2018
- Distillating Knowledge from Graph Convolutional Networks. Yang, Yiding et al. CVPR 2020
- Correlation Congruence for Knowledge Distillation. Peng, Baoyun et al. ICCV 2019
- Similarity-Preserving Knowledge Distillation. Tung, Frederick, and Mori Greg. ICCV 2019
- Variational Information Distillation for Knowledge Transfer. Ahn, Sungsoo et al. CVPR 2019
- Contrastive Representation Distillation. Tian, Yonglong et al. ICLR 2020 [RepDistill]
- Online Knowledge Distillation via Collaborative Learning. Guo, Qiushan et al. CVPR 2020
- Peer Collaborative Learning for Online Knowledge Distillation. Wu, Guile & Gong, Shaogang. arXiv:2006.04147
- Online Knowledge Distillation via Collaborative Learning. Guo, Qiushan et al. CVPR 2020
- Moonshine:Distilling with Cheap Convolutions. Crowley, Elliot J. et al. NIPS 2018
- Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation. Zhang, Linfeng et al. ICCV 2019
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation. Hou, Yuenan et al. ICCV 2019
- BAM! Born-Again Multi-Task Networks for Natural Language Understanding. Clark, Kevin et al. ACL 2019,short
- Self-Knowledge Distillation in Natural Language Processing. Hahn, Sangchul and Choi, Heeyoul. arXiv:1908.01851
- Rethinking Data Augmentation: Self-Supervision and Self-Distillation. Lee, Hankook et al. ICLR 2020
- MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks. arXiv:1911.09418
- Self-Distillation Amplifies Regularization in Hilbert Space. Mobahi, Hossein et al. arXiv:2002.05715
- MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. Wang, Wenhui et al. arXiv:2002.10957
- Regularizing Class-wise Predictions via Self-knowledge Distillation. CVPR 2020 [code]
- Self-Distillation as Instance-Specific Label Smoothing. Zhang, Zhilu & Sabuncu, Mert R. arXiv:2006.05065
- Paraphrasing Complex Network:Network Compression via Factor Transfer. Kim, Jangho et al. NIPS 2018
- Relational Knowledge Distillation. Park, Wonpyo et al. CVPR 2019
- Knowledge Distillation via Instance Relationship Graph. Liu, Yufan et al. CVPR 2019
- Contrastive Representation Distillation. Tian, Yonglong et al. ICLR 2020
- Teaching To Teach By Structured Dark Knowledge. ICLR 2020
- Inter-Region Affinity Distillation for Road Marking Segmentation. Hou, Yuenan et al. CVPR 2020 [code]
- Heterogeneous Knowledge Distillation using Information Flow Modeling. Passalis et al. CVPR 2020 [code]
- Learning using privileged information: similarity control and knowledge transfer. Vapnik, Vladimir and Rauf, Izmailov. MLR 2015
- Unifying distillation and privileged information. Lopez-Paz, David et al. ICLR 2016
- Model compression via distillation and quantization. Polino, Antonio et al. ICLR 2018
- KDGAN:Knowledge Distillation with Generative Adversarial Networks. Wang, Xiaojie. NIPS 2018
- Efficient Video Classification Using Fewer Frames. Bhardwaj, Shweta et al. CVPR 2019
- Retaining privileged information for multi-task learning. Tang, Fengyi et al. KDD 2019
- A Generalized Meta-loss function for regression and classification using privileged information. Asif, Amina et al. arXiv:1811.06885
- Private Knowledge Transfer via Model Distillation with Generative Adversarial Networks. Gao, Di & Zhuo, Cheng. AAAI 2020
- Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks. Xu, Zheng et al. arXiv:1709.00513
- KTAN: Knowledge Transfer Adversarial Network. Liu, Peiye et al. arXiv:1810.08126
- KDGAN:Knowledge Distillation with Generative Adversarial Networks. Wang, Xiaojie. NIPS 2018
- Adversarial Learning of Portable Student Networks. Wang, Yunhe et al. AAAI 2018
- Adversarial Network Compression. Belagiannis, Vasileios et al. ECCV 2018
- Cross-Modality Distillation: A case for Conditional Generative Adversarial Networks. ICASSP 2018
- Adversarial Distillation for Efficient Recommendation with External Knowledge. TOIS 2018
- Training student networks for acceleration with conditional adversarial networks. Xu, Zheng et al. BMVC 2018
- DAFL:Data-Free Learning of Student Networks. Chen, Hanting et al. ICCV 2019
- MEAL: Multi-Model Ensemble via Adversarial Learning. Shen,Zhiqiang, He,Zhankui, and Xue Xiangyang. AAAI 2019
- Knowledge Distillation with Adversarial Samples Supporting Decision Boundary. Heo, Byeongho et al. AAAI 2019
- Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection. Liu, Jian et al. AAAI 2019
- Adversarially Robust Distillation. Goldblum, Micah et al. AAAI 2020
- GAN-Knowledge Distillation for one-stage Object Detection. Hong, Wei et al. arXiv:1906.08467
- Lifelong GAN: Continual Learning for Conditional Image Generation. Kundu et al. arXiv:1908.03884
- Compressing GANs using Knowledge Distillation. Aguinaldo, Angeline et al. arXiv:1902.00159
- Feature-map-level Online Adversarial Knowledge Distillation. ICML 2020
- MineGAN: effective knowledge transfer from GANs to target domains with few images. Wang, Yaxing et al. CVPR 2020
- Distilling portable Generative Adversarial Networks for Image Translation. Chen, Hanting et al. AAAI 2020
- GAN Compression: Efficient Architectures for Interactive Conditional GANs. Junyan Zhu et al. CVPR 2020 [code]
- Adversarial network compression. Belagiannis et al. ECCV 2018
- Few Sample Knowledge Distillation for Efficient Network Compression. Li, Tianhong et al. CVPR 2020
- Learning What and Where to Transfer. Jang, Yunhun et al, ICML 2019
- Transferring Knowledge across Learning Processes. Moreno, Pablo G et al. ICLR 2019
- Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. Liu, Qing et al. ICCV 2019
- Diversity with Cooperation: Ensemble Methods for Few-Shot Classification. Dvornik, Nikita et al. ICCV 2019
- Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation. arXiv:1911.05329v1
- Progressive Knowledge Distillation For Generative Modeling. ICLR 2020
- Few Shot Network Compression via Cross Distillation. AAAI 2020
- Data-Free Knowledge Distillation for Deep Neural Networks. NIPS 2017
- Zero-Shot Knowledge Distillation in Deep Networks. ICML 2019
- DAFL:Data-Free Learning of Student Networks. ICCV 2019
- Zero-shot Knowledge Transfer via Adversarial Belief Matching. Micaelli, Paul and Storkey, Amos. NIPS 2019
- Dream Distillation: A Data-Independent Model Compression Framework. Kartikeya et al. ICML 2019
- Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion. Yin, Hongxu et al. CVPR 2020
- Data-Free Adversarial Distillation. Fang, Gongfan et al. CVPR 2020
- The Knowledge Within: Methods for Data-Free Model Compression. Haroush, Matan et al. CVPR 2020
- Knowledge Extraction with No Observable Data. Yoo, Jaemin et al. NIPS 2019 [code]
- Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN. CVPR 2020
- DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier. Addepalli, Sravanti et al. arXiv:1912.11960
- Generative Low-bitwidth Data Free Quantization. Xu, Shoukai et al. arXiv:2003.03603
- This dataset does not exist: training models from generated images. arXiv:1911.02888
- MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation. Sanjay et al. arXiv:2005.03161
- Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data. Such et al. ICML 2020
- Billion-scale semi-supervised learning for image classification. FAIR. arXiv:1905.00546 [code]
- other data-free model compression:
- Data-free Parameter Pruning for Deep Neural Networks. Srinivas, Suraj et al. arXiv:1507.06149
- Data-Free Quantization Through Weight Equalization and Bias Correction. Nagel, Markus et al. ICCV 2019
- DAC: Data-free Automatic Acceleration of Convolutional Networks. Li, Xin et al. WACV 2019
- Improving Neural Architecture Search Image Classifiers via Ensemble Learning. Macko, Vladimir et al. arXiv:1903.06236
- Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. Li, Changlin et al. arXiv:1911.13053v1
- Towards Oracle Knowledge Distillation with Neural Architecture Search. Kang, Minsoo et al. AAAI 2020
- Search for Better Students to Learn Distilled Knowledge. Gu, Jindong & Tresp, Volker arXiv:2001.11612
- Circumventing Outliers of AutoAugment with Knowledge Distillation. Wei, Longhui et al. arXiv:2003.11342
- Network Pruning via Transformable Architecture Search. Dong, Xuanyi & Yang, Yi. NIPS 2019
- Search to Distill: Pearls are Everywhere but not the Eyes. Liu Yu et al. CVPR 2020
- AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks. Fu, Yonggan et al. ICML 2020 [code]
- N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning. Ashok, Anubhav et al. ICLR 2018
- Knowledge Flow:Improve Upon Your Teachers. Liu, Iou-jen et al. ICLR 2019
- Transferring Knowledge across Learning Processes. Moreno, Pablo G et al. ICLR 2019
- Exploration by random network distillation. Burda, Yuri et al. ICLR 2019
- Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning. Hong, Zhang-Wei et al. arXiv:2002.00149
- Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach. Xue, Zeyue et al. arXiv:2002.02202
- Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning. Cha, han et al. arXiv:2005.06105
- Dual Policy Distillation. Lai, Kwei-Herng et al. arXiv:2006.04061
- Learning from Multiple Teacher Networks. You, Shan et al. KDD 2017
- Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data. ICLR 2017
- Knowledge Adaptation: Teaching to Adapt. Arxiv:1702.02052
- Deep Model Compression: Distilling Knowledge from Noisy Teachers. Sau, Bharat Bhusan et al. arXiv:1610.09650v2
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Tarvainen, Antti and Valpola, Harri. NIPS 2017
- Born-Again Neural Networks. Furlanello, Tommaso et al. ICML 2018
- Deep Mutual Learning. Zhang, Ying et al. CVPR 2018
- Knowledge distillation by on-the-fly native ensemble. Lan, Xu et al. NIPS 2018
- Collaborative learning for deep neural networks. Song, Guocong and Chai, Wei. NIPS 2018
- Data Distillation: Towards Omni-Supervised Learning. Radosavovic, Ilija et al. CVPR 2018
- Multilingual Neural Machine Translation with Knowledge Distillation. ICLR 2019
- Unifying Heterogeneous Classifiers with Distillation. Vongkulbhisal et al. CVPR 2019
- Distilled Person Re-Identification: Towards a More Scalable System. Wu, Ancong et al. CVPR 2019
- Diversity with Cooperation: Ensemble Methods for Few-Shot Classification. Dvornik, Nikita et al. ICCV 2019
- Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System. Yang, Ze et al. WSDM 2020
- FEED: Feature-level Ensemble for Knowledge Distillation. Park, SeongUk and Kwak, Nojun. AAAI 2020
- Stochasticity and Skip Connection Improve Knowledge Transfer. Lee, Kwangjin et al. ICLR 2020
- Online Knowledge Distillation with Diverse Peers. Chen, Defang et al. AAAI 2020
- Hydra: Preserving Ensemble Diversity for Model Distillation. Tran, Linh et al. arXiv:2001.04694
- Distilled Hierarchical Neural Ensembles with Adaptive Inference Cost. Ruiz, Adria et al. arXv:2003.01474
- Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition. Gao, Yan et al. arXiv:2005.09310
- Adaptive Learning for Multi-teacher Multi-student Knowledge Distillation. Yuang Liu et al. 2019
- Amalgamating Knowledge towards Comprehensive Classification. Shen, Chengchao et al. AAAI 2019
- Amalgamating Filtered Knowledge : Learning Task-customized Student from Multi-task Teachers. Ye, Jingwen et al. IJCAI 2019
- Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning. Luo, Sihui et al. IJCAI 2019
- Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More. Ye, Jingwen et al. CVPR 2019
- Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation. ICCV 2019
- Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN. CVPR 2020
- SoundNet: Learning Sound Representations from Unlabeled Video SoundNet Architecture. Aytar, Yusuf et al. ECCV 2016
- Cross Modal Distillation for Supervision Transfer. Gupta, Saurabh et al. CVPR 2016
- Emotion recognition in speech using cross-modal transfer in the wild. Albanie, Samuel et al. ACM MM 2018
- Through-Wall Human Pose Estimation Using Radio Signals. Zhao, Mingmin et al. CVPR 2018
- Compact Trilinear Interaction for Visual Question Answering. Do, Tuong et al. ICCV 2019
- Cross-Modal Knowledge Distillation for Action Recognition. Thoker, Fida Mohammad and Gall, Juerge. ICIP 2019
- Learning to Map Nearly Anything. Salem, Tawfiq et al. arXiv:1909.06928
- Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. Liu, Qing et al. ICCV 2019
- UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation. Kundu et al. ICCV 2019
- CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency. Chen, Yun-Chun et al. CVPR 2019
- XD:Cross lingual Knowledge Distillation for Polyglot Sentence Embeddings. ICLR 2020
- Effective Domain Knowledge Transfer with Soft Fine-tuning. Zhao, Zhichen et al. arXiv:1909.02236
- ASR is all you need: cross-modal distillation for lip reading. Afouras et al. arXiv:1911.12747v1
- Knowledge distillation for semi-supervised domain adaptation. arXiv:1908.07355
- Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition. Meng, Zhong et al. arXiv:2001.01798
- Cluster Alignment with a Teacher for Unsupervised Domain Adaptation. ICCV 2019
- Attention Bridging Network for Knowledge Transfer. Li, Kunpeng et al. ICCV 2019
- Unpaired Multi-modal Segmentation via Knowledge Distillation. Dou, Qi et al. arXiv:2001.03111
- Multi-source Distilling Domain Adaptation. Zhao, Sicheng et al. arXiv:1911.11554
- Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing. Hu, Hengtong et al. CVPR 2020
- Improving Semantic Segmentation via Self-Training. Zhu, Yi et al. arXiv:2004.14960
- Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation. arXiv:2005.08213
- Joint Progressive Knowledge Distillation and Unsupervised Domain Adaptation. arXiv:2005.07839
- Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge. Zhao, Long et al. CVPR 2020
- Large-Scale Domain Adaptation via Teacher-Student Learning. Li, Jinyu et al. arXiv:1708.05466
- Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data. Fayek, Haytham M. & Kumar, Anurag. IJCAI 2020
- Distilling Cross-Task Knowledge via Relationship Matching. Ye, Han-Jia. et al. CVPR 2020 [code]
- Modality distillation with multiple stream networks for action recognition. Garcia, Nuno C. et al. ECCV 2018
- Face model compression by distilling knowledge from neurons. Luo, Ping et al. AAAI 2016
- Learning efficient object detection models with knowledge distillation. Chen, Guobin et al. NIPS 2017
- Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. Mishra, Asit et al. NIPS 2018
- Distilled Person Re-identification: Towars a More Scalable System. Wu, Ancong et al. CVPR 2019
- Efficient Video Classification Using Fewer Frames. Bhardwaj, Shweta et al. CVPR 2019
- Fast Human Pose Estimation. Zhang, Feng et al. CVPR 2019
- Distilling knowledge from a deep pose regressor network. Saputra et al. arXiv:1908.00858 (2019)
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation. Hou, Yuenan et al. ICCV 2019
- Structured Knowledge Distillation for Semantic Segmentation. Liu, Yifan et al. CVPR 2019
- Relation Distillation Networks for Video Object Detection. Deng, Jiajun et al. ICCV 2019
- Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection. Dong, Xuanyi and Yang, Yi. ICCV 2019
- Progressive Teacher-student Learning for Early Action Prediction. Wang, Xionghui et al. CVPR 2019
- Lightweight Image Super-Resolution with Information Multi-distillation Network. Hui, Zheng et al. ICCVW 2019
- AWSD:Adaptive Weighted Spatiotemporal Distillation for Video Representation. Tavakolian, Mohammad et al. ICCV 2019
- Dynamic Kernel Distillation for Efficient Pose Estimation in Videos. Nie, Xuecheng et al. ICCV 2019
- Teacher Guided Architecture Search. Bashivan, Pouya and Tensen, Mark. ICCV 2019
- Online Model Distillation for Efficient Video Inference. Mullapudi et al. ICCV 2019
- Distilling Object Detectors with Fine-grained Feature Imitation. Wang, Tao et al. CVPR 2019
- Relation Distillation Networks for Video Object Detection. Deng, Jiajun et al. ICCV 2019
- Knowledge Distillation for Incremental Learning in Semantic Segmentation. arXiv:1911.03462
- MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization. arXiv:1910.12295
- Teacher-Students Knowledge Distillation for Siamese Trackers. arXiv:1907.10586
- LaTeS: Latent Space Distillation for Teacher-Student Driving Policy Learning. Zhao, Albert et al. CVPR 2020(pre)
- Knowledge Distillation for Brain Tumor Segmentation. arXiv:2002.03688
- ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes. Chen, Yuhua et al. CVPR 2018
- Next Point-of-Interest Recommendation on Resource-Constrained Mobile Devices. WWW 2020
- Multi-Representation Knowledge Distillation For Audio Classification. Gao, Liang et al. arXiv:2002.09607
- Collaborative Distillation for Ultra-Resolution Universal Style Transfer. Wang, Huan et al. CVPR 2020 [code]
- ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference. Chung, Jae-Won et al. arXiv:2003.10735
- Object Relational Graph with Teacher-Recommended Learning for Video Captioning. Zhang, Ziqi et al. CVPR 2020
- Spatio-Temporal Graph for Video Captioning with Knowledge distillation. CVPR 2020 [code]
- Squeezed Deep 6DoF Object Detection Using Knowledge Distillation. Felix, Heitor et al. arXiv:2003.13586
- Distilled Semantics for Comprehensive Scene Understanding from Videos. Tosi, Fabio et al. arXiv:2003.14030
- Parallel WaveNet: Fast high-fidelity speech synthesis. Van et al. ICML 2018
- Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning. Wang Chaoyang et al. ICCV 2019
- KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow. Murugesan et al. MIDL 2020
- Geometry-Aware Distillation for Indoor Semantic Segmentation. Jiao, Jianbo et al. CVPR 2019
- Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection. ICCV 2019
- Distill Image Dehazing with Heterogeneous Task Imitation. Hong, Ming et al. CVPR 2020
- Knowledge Distillation for Action Anticipation via Label Smoothing. Camporese et al. arXiv:2004.07711
- More Grounded Image Captioning by Distilling Image-Text Matching Model. Zhou, Yuanen et al. CVPR 2020
- Distilling Knowledge from Refinement in Multiple Instance Detection Networks. Zeni, Luis Felipe & Jung, Claudio. arXiv:2004.10943
- A General Knowledge Distillation Framework for Counterfactual Recommendation via Uniform Data. SIGIR 2020
- Enabling Incremental Knowledge Transfer for Object Detection at the Edge. arXiv:2004.05746
- Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings. Bergmann, Paul et al. CVPR 2020
- TA-Student VQA: Multi-Agents Training by Self-Questioning. Xiong, Peixi & Wu Ying. CVPR 2020
- Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. Jiang, Lu et al. ICML 2018
- A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection. Chen, Zhihao et al. CVPR 2020 [code]
- Learning Lightweight Face Detector with Knowledge Distillation. Zhang Shifeng et al. IEEE 2019
- Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation. ICIP 2019
- Distilling Object Detectors with Task Adaptive Regularization. Sun, Ruoyu et al. arXiv:2006.13108
- Privileged Features Distillation at Taobao Recommendations. Xu, Chen et al. KDD 2020
- Patient Knowledge Distillation for BERT Model Compression. Sun, Siqi et al. arXiv:1908.09355
- TinyBERT: Distilling BERT for Natural Language Understanding. Jiao, Xiaoqi et al. arXiv:1909.10351
- Learning to Specialize with Knowledge Distillation for Visual Question Answering. NIPS 2018
- Knowledge Distillation for Bilingual Dictionary Induction. EMNLP 2017
- A Teacher-Student Framework for Maintainable Dialog Manager. EMNLP 2018
- Understanding Knowledge Distillation in Non-Autoregressive Machine Translation. arxiv 2019
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Sanh, Victor et al. arXiv:1910.01108
- Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Turc, Iulia et al. arXiv:1908.08962
- On Knowledge distillation from complex networks for response prediction. Arora, Siddhartha et al. NAACL 2019
- Distilling the Knowledge of BERT for Text Generation. arXiv:1911.03829v1
- Understanding Knowledge Distillation in Non-autoregressive Machine Translation. arXiv:1911.02727
- MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. Sun, Zhiqing et al. ACL 2020
- Acquiring Knowledge from Pre-trained Model to Neural Machine Translation. Weng, Rongxiang et al. AAAI 2020
- TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval. Lu, Wenhao et al. KDD 2020
- Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation. Xu, Yige et al. arXiv:2002.10345
- FastBERT: a Self-distilling BERT with Adaptive Inference Time. Liu, Weijie et al. ACL 2020
- LightRec: a Memory and Search-Efficient Recommender System. Lian Defu et al. WWW 2020
- LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression. Mao, Yihuan et al. arXiv:2004.04124
- DynaBERT: Dynamic BERT with Adaptive Width and Depth. Hou, Lu et al. arXiv:2004.04037
- Structure-Level Knowledge Distillation For Multilingual Sequence Labeling. Wang, Xinyu et al. ACL 2020
- Distilled embedding: non-linear embedding factorization using knowledge distillation. Lioutas, Vasileios et al. arXiv:1910.06720
- TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER. Mukherjee & Awadallah. ACL 2020
- Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation. Sun, Haipeng et al. arXiv:2004.10171
- Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. Reimers, Nils & Gurevych, Iryna arXiv:2004.09813
- Distilling Knowledge for Fast Retrieval-based Chat-bots. Tahami et al. arXiv:2004.11045
- Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language. ACL 2020
- Local Clustering with Mean Teacher for Semi-supervised Learning. arXiv:2004.09665
- Time Series Data Augmentation for Neural Networks by Time Warping with a Discriminative Teacher. arXiv:2004.08780
- Syntactic Structure Distillation Pretraining For Bidirectional Encoders. arXiv: 2005.13482
- Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation. arXiv:2003.02877
- Distilling Neural Networks for Faster and Greener Dependency Parsing. arXiv:2006.00844
- Distilling Knowledge from Well-informed Soft Labels for Neural Relation Extraction. AAAI 2020 [code]
- More Grounded Image Captioning by Distilling Image-Text Matching Model. Zhou, Yuanen et al. CVPR 2020
- Multimodal Learning with Incomplete Modalities by Knowledge Distillation. Wang, Qi et al. KDD 2020
- Accelerating Convolutional Neural Networks with Dominant Convolutional Kernel and Knowledge Pre-regression. ECCV 2016
- N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning. Ashok, Anubhav et al. ICLR 2018
- Slimmable Neural Networks. Yu, Jiahui et al. ICLR 2018
- Co-Evolutionary Compression for Unpaired Image Translation. Shu, Han et al. ICCV 2019
- MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning. Liu, Zechun et al. ICCV 2019
- LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning. ICLR 2020
- Pruning with hints: an efficient framework for model acceleration. ICLR 2020
- Training convolutional neural networks with cheap convolutions and online distillation. arXiv:1909.13063
- Cooperative Pruning in Cross-Domain Deep Neural Network Compression. Chen, Shangyu et al. IJCAI 2019
- QKD: Quantization-aware Knowledge Distillation. Kim, Jangho et al. arXiv:1911.12491v1
- Neural Network Pruning with Residual-Connections and Limited-Data. Luo, Jian-Hao & Wu, Jianxin. CVPR 2020
- Training Quantized Neural Networks with a Full-precision Auxiliary Module. Zhuang, Bohan et al. CVPR 2020
- Towards Effective Low-bitwidth Convolutional Neural Networks. Zhuang, Bohan et al. CVPR 2018
- Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations. Zhuang, Bohan et al. arXiv:1908.04680
- Paying more attention to snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation. Le et al. arXiv:2006.11487 [code]
- Do deep nets really need to be deep?. Ba,Jimmy, and Rich Caruana. NIPS 2014
- When Does Label Smoothing Help? Müller, Rafael, Kornblith, and Hinton. NIPS 2019
- Towards Understanding Knowledge Distillation. Phuong, Mary and Lampert, Christoph. AAAI 2019
- Harnessing deep neural networks with logical rules. ACL 2016
- Adaptive Regularization of Labels. Ding, Qianggang et al. arXiv:1908.05474
- Knowledge Isomorphism between Neural Networks. Liang, Ruofan et al. arXiv:1908.01581
- Neural Network Distiller: A Python Package For DNN Compression Research. arXiv:1910.12232
- (survey)Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation. arXiv:1912.13179
- Understanding and Improving Knowledge Distillation. Tang, Jiaxi et al. arXiv:2002.03532
- The State of Knowledge Distillation for Classification. Ruffy, Fabian and Chahal, Karanbir. arXiv:1912.10850 [code]
- TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing. HIT and iFLYTEK. arXiv:2002.12620
- Explaining Knowledge Distillation by Quantifying the Knowledge. Zhang, Quanshi et al. CVPR 2020
- DeepVID: deep visual interpretation and diagnosis for image classifiers via knowledge distillation. IEEE Trans, 2019.
- On the Unreasonable Effectiveness of Knowledge Distillation: Analysis in the Kernel Regime. Rahbar, Arman et al. arXiv:2003.13438
- (survey)Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks. Wang, Lin & Yoon, Kuk-Jin. arXiv:2004.05937
- Why distillation helps: a statistical perspective. arXiv:2005.10419
- Transferring Inductive Biases through Knowledge Distillation. Abnar, Samira et al. arXiv:2006.00555
- Does label smoothing mitigate label noise? Lukasik, Michal et al. ICML 2020
- An Empirical Analysis of the Impact of Data Augmentation on Knowledge Distillation. Das, Deepan et al. arXiv:2006.03810
- Knowledge Distillation: A Survey. Gou, Jianping et al. arXiv:2006.05525
- Does Adversarial Transferability Indicate Knowledge Transferability? Liang, Kaizhao et al. arXiv:2006.14512
Note: All papers pdf can be found and downloaded on Bing or Google.
Source: https://github.com/FLHonker/Awesome-Knowledge-Distillation
Contact: Yuang Liu(frankliu624@outlook.com), AIDA, ECNU.