Awesome Knowledge Distillation for Object Detection

A curated list of awesome distillation techniques designed for object detectors.

Parameters compression and accuracy boosting are core problems for object detection towards practical application, where knowledge distillation (KD) is one of the most popular solutions. KD aims at training the compact model (student) by transferring knowledge from a high-capacity model (teacher). Papers and codes are listed.

Knowledge Distillation for General Object Detectors
Knowledge Distillation for Specific Object Detectors
- Knowledge Distillation for GFL
- Knowledge Distillation for DETR
Knowledge Distillation for Heterogeneous Object Detectors
Teacher Free Knowledge Distillation for Object Detectors
Miscellaneous
Newly Published Papers

Knowledge Distillation for General Object Detectors

NeurIPS 2017. [NeurIPS] - A new framework to learn compact and fast object detection networks with improved accuracy using knowledge distillation and hint learning.

Learning Efficient Object Detection Models with Knowledge Distillation
Guobin Chen and Wongun Choi and Xiang Yu and Tony Han and Manmohan Chandraker

Mimic. CVPR 2017. [CVF] [IEEE Xplore] - A fully convolutional feature mimic framework to train very efficient CNN based detectors, which do not need ImageNet pre-training and achieve competitive performance as the large and slow models.

Mimicking Very Efficient Network for Object Detection
Quanquan Li and Shengying Jin and Junjie Yan

Feature Distillation

Foreground Masks

Ground Truth Guided

FGFI. CVPR 2019. [CVF] [IEEE Xplore] [arXiv] <GitHub> - A fine-grained feature imitation method exploiting the cross-location discrepancy of feature response.

Distilling Object Detectors With Fine-Grained Feature Imitation
Tao Wang and Li Yuan and Xiaopeng Zhang and Jiashi Feng

DeFeat. CVPR 2021. [CVF] [IEEE Xplore] [arXiv] - A novel distillation algorithm via decoupled features for learning a better student detector.

Distilling Object Detectors via Decoupled Features
Jianyuan Guo and Kai Han and Yunhe Wang and Han Wu and Xinghao Chen and Chunjing Xu and Chang Xu

Prediction Guided

FRS. NeurIPS 2021. [NeurIPS] [OpenReview] [arXiv] - A novel Feature-Richness Score (FRS) method to choose important features that improve generalized detectability during distilling.

Distilling Object Detectors with Feature Richness
Zhixing Du and Rui Zhang and Ming Chang and Xishan Zhang and Shaoli Liu and Tianshi Chen and Yunji Chen

PGD. ECCV 2022. [ECVA] [Springer] [arXiv] <GitHub> - Distill on the key predictive regions of the teacher.

Prediction-Guided Distillation for Dense Object Detection
Chenhongyi Yang and Mateusz Ochal and Amos Storkey and Elliot J Crowley

TBD. PR. [ScienceDirect] [arXiv] - Alleviates the misalignment between classification score and localization quality via Harmony Score and Task-Balanced Distillation.

Task-balanced distillation for object detection
Ruining Tang and Zhenyu Liu and Yangguang Li and Yiguo Song and Hui Liu and Qide Wang and Jing Shao and Guifang Duan and Jianrong Tan

Attention Guided

FKD. ICLR 2021. [OpenReview] <GitHub> - Attention-guided distillation and non-local distillation.

Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors
Chunting Zhou and Graham Neubig and Jiatao Gu

FKD. TPAMI. [IEEE Xplore] - A structured knowledge distillation scheme, including attention-guided distillation and non-local distillation.

Structured Knowledge Distillation for Accurate and Efficient Object Detection
Linfeng Zhang and Kaisheng Ma

FGD. CVPR 2022. [CVF] [IEEE Xplore] [arXiv] <GitHub> - Focal distillation separates the fore-ground and background, while global distillation rebuilds the relation between different pixels and transfers it from teachers to students.

Focal and Global Knowledge Distillation for Detectors
Zhendong Yang and Zhe Li and Xiaohu Jiang and Yuan Gong and Zehuan Yuan and Danpei Zhao and Chun Yuan

GLAMD. ECCV 2022. [ECVA] [Springer] - Divide the feature maps into several patches and apply an attention mechanism for both the entire feature area and each patch.

GLAMD: Global and Local Attention Mask Distillation for Object Detectors
Younho Jang and Wheemyung Shin and Jinbeom Kim and Simon Woo and Sung-Ho Bae

Miscellaneous Foreground Masks

CD. ICCV 2021. [CVF] [IEEE Xplore] [arXiv] <GitHub> - Normalize the activation map of each channel to obtain a soft probability map.

Channel-wise Knowledge Distillation for Dense Prediction
Changyong Shu and Yifan Liu and Jianfei Gao and Zheng Yan and Chunhua Shen

Miscellaneous Feature Distillation

DRKD. IJCAI 2023. [arXiv] - Dual relation knowledge distillation, including pixel-wise relation distillation and instance-wise relation distillation

Dual Relation Knowledge Distillation for Object Detection
Zhenliang Ni and Fukui Yang and Shengzhao Wen and Gang Zhang

Instance Distillation

GID. CVPR 2021. [CVF] [IEEE Xplore] [arXiv] - A novel distillation method for detection tasks based on discriminative instances without considering the positive or negative distinguished by GT.

General Instance Distillation for Object Detection
Xing Dai and Zeren Jiang and Zhao Wu and Yiping Bao and Zhicheng Wang and Si Liu and Erjin Zhou

DSIG. ICCV 2021. [CVF] [IEEE Xplore] [arXiv] <GitHub> - A simple knowledge structure to exploit and encode information inside the detection system to facilitate detector knowledge distillation.

Deep Structured Instance Graph for Distilling Object Detectors
Yixin Chen and Pengguang Chen and Shu Liu and Liwei Wang and Jiaya Jia

ICD. NeurIPS 2021. [NeurIPS] [OpenReview] [arXiv] <GitHub> - An instance-conditional distillation framework to find desired knowledge.

Instance-Conditional Knowledge Distillation for Object Detection
Zijian Kang and Peizhen Zhang and Xiangyu Zhang and Jian Sun and Nanning Zheng

Label Assignment Distillation

LAD. WACV 2022. [CVF] [IEEE Xplore] [arXiv] <MMDet> - Use the teacher network to generate labels for the student, through the hard labels dynamically assigned by the teacher.

Improving Object Detection by Label Assignment Distillation
Chuong H. Nguyen and Thuy C. Nguyen and Tuan N. Tang and Nam L. H. Phan

Balancing between Tasks

TADF. [arXiv] - A general distillation framework that adaptively transfers knowledge from teacher to student according to the task specific prior.

Distilling Object Detectors with Task Adaptive Regularization
Ruoyu Sun and Fuhui Tang and Xiaopeng Zhang and Hongkai Xiong and Qi Tian

BCKD. ICCV 2023 [CVF] [IEEE Xplore] [arXiv] - A novel distillation method with cross-task consistent protocols, tailored for the dense object detection.

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
Longrong Yang and Xianpan Zhou and Xuewei Li and Liang Qiao and Zheyang Li and Ziwei Yang and Gaoang Wang and Xi Li

Miscellaneous Knowledge Distillation for General Object Detectors

AAAI 2022. [AAAI] [arXiv] - RM takes the rank of candidate boxes from teachers as a new form of knowledge to distill. PFI attempts to correlate feature differences with prediction differences.

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-Guided Feature Imitation
Gang Li and Xiang Li and Yujie Wang and Shanshan Zhang and Yichao Wu and Ding Liang

NeurIPS 2022. [OpenReview] [arXiv] <GitHub> - By taking into account additional contrast and structural cues, feature importance, correlation, and spatial dependence in the feature space are considered in the loss formulation.

Structural Knowledge Distillation for Object Detection
Philip De Rijk and Lukas Schneider and Marius Cordts and Dariu M Gavrila

CrossKD. [arXiv] <GitHub> - Delivers the intermediate features of the student's detection head to the teacher's detection head

CrossKD: Cross-Head Knowledge Distillation for Dense Object Detection
Jiabao Wang and Yuming Chen and Zhaohui Zheng and Xiang Li and Ming-Ming Cheng and Qibin Hou

Knowledge Distillation for Specific Object Detectors

Knowledge Distillation for GFL

LD. CVPR 2022. [CVF] [IEEE Xplore] [arXiv] <GitHub> <MMDet> - Standard KD by adopting the general localization representation of bounding box.

Localization Distillation for Dense Object Detection
Zhaohui Zheng and Rongguang Ye and Ping Wang and Jun Wang and Dongwei Ren and Wangmeng Zuo

Knowledge Distillation for DETR

DETRDistill. ICCV 2023. [CVF] [arXiv] - A novel knowledge distillation dedicated to DETR-families.

DETRDistill: A Universal Knowledge Distillation Framework for DETR-families
Jiahao Chang and Shuo Wang and Guangkai Xu and Zehui Chen and Chenhongyi Yang and Feng Zhao

D^3^ETR. [arXiv] - Distills knowledge in decoder predictions and attention maps from the teachers to students.

D^3^ETR: Decoder Distillation for Detection Transformer
Xiaokang Chen and Jiahui Chen and Yan Liu and Gang Zeng

KD-DETR. [arXiv] - A general knowledge distillation paradigm for DETR with consistent distillation points sampling.

Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
Yu Wang and Xin Li and Shengzhao Wen and Fukui Yang and Wanping Zhang and Gang Zhang and Haocheng Feng and Junyu Han and Errui Ding

Knowledge Distillation for Heterogeneous Object Detector Pairs

G-DetKD. ICCV 2021. [CVF] [IEEE Xplore] [arXiv] - A novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels to provide the optimal guidance to the student.

G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
Lewei Yao and Renjie Pi and Hang Xu and Wei Zhang and Zhenguo Li and Tong Zhang

HEAD. ECCV 2022. [ECVA] [Springer] [arXiv] <GitHub> - HEtero-Assists Distillation leveraging heterogeneous detection heads as assistants to guide the optimization of the student detector.

HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
Luting Wang and Xiaojie Li and Yue Liao and Zeren Jiang and Jianlong Wu and Fei Wang and Chen Qian and Si Liu

PKD. NeurIPS 2022. [OpenReview] [arXiv] <GitHub> - Imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher and relax constraints on the magnitude of the features.

PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient
Weihan Cao and Yifan Zhang and Jianfei Gao and Anda Cheng and Ke Cheng and Jian Cheng

Teacher Free Knowledge Distillation for Object Detectors

MimicDet. ECCV 2020. [ECVA] [Springer] [arXiv] - A novel and efficient framework to train a one-stage detector by directly mimic the two-stage features, aiming to bridge the accuracy gap between one-stage and two-stage detector.

MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection
Xin Lu and Quanquan Li and Buyu Li and Junjie Yan

LabelEnc. ECCV 2020. [ECVA] [Springer] [arXiv] <GitHub> - A new intermediate supervision method to boost the training of object detection systems.

LabelEnc: A New Intermediate Supervision Method for Object Detection
Miao Hao and Yitao Liu and Xiangyu Zhang and Jian Sun