A curated list of awesome distillation techniques designed for object detectors.
Parameters compression and accuracy boosting are core problems for object detection towards practical application, where knowledge distillation (KD) is one of the most popular solutions. KD aims at training the compact model (student) by transferring knowledge from a high-capacity model (teacher). Papers and codes are listed.
- Knowledge Distillation for General Object Detectors
- Knowledge Distillation for Specific Object Detectors
- Knowledge Distillation for Heterogeneous Object Detectors
- Teacher Free Knowledge Distillation for Object Detectors
- Miscellaneous
- Newly Published Papers
NeurIPS 2017. [NeurIPS] - A new framework to learn compact and fast object detection networks with improved accuracy using knowledge distillation and hint learning.
- Learning Efficient Object Detection Models with Knowledge Distillation
- Guobin Chen and Wongun Choi and Xiang Yu and Tony Han and Manmohan Chandraker
Mimic. CVPR 2017. [CVF] [IEEE Xplore] - A fully convolutional feature mimic framework to train very efficient CNN based detectors, which do not need ImageNet pre-training and achieve competitive performance as the large and slow models.
- Mimicking Very Efficient Network for Object Detection
- Quanquan Li and Shengying Jin and Junjie Yan
FGFI. CVPR 2019. [CVF] [IEEE Xplore] [arXiv] <GitHub> - A fine-grained feature imitation method exploiting the cross-location discrepancy of feature response.
- Distilling Object Detectors With Fine-Grained Feature Imitation
- Tao Wang and Li Yuan and Xiaopeng Zhang and Jiashi Feng
DeFeat. CVPR 2021. [CVF] [IEEE Xplore] [arXiv] - A novel distillation algorithm via decoupled features for learning a better student detector.
- Distilling Object Detectors via Decoupled Features
- Jianyuan Guo and Kai Han and Yunhe Wang and Han Wu and Xinghao Chen and Chunjing Xu and Chang Xu
FRS. NeurIPS 2021. [NeurIPS] [OpenReview] [arXiv] - A novel Feature-Richness Score (FRS) method to choose important features that improve generalized detectability during distilling.
- Distilling Object Detectors with Feature Richness
- Zhixing Du and Rui Zhang and Ming Chang and Xishan Zhang and Shaoli Liu and Tianshi Chen and Yunji Chen
PGD. ECCV 2022. [ECVA] [Springer] [arXiv] <GitHub> - Distill on the key predictive regions of the teacher.
- Prediction-Guided Distillation for Dense Object Detection
- Chenhongyi Yang and Mateusz Ochal and Amos Storkey and Elliot J Crowley
TBD. PR. [ScienceDirect] [arXiv] - Alleviates the misalignment between classification score and localization quality via Harmony Score and Task-Balanced Distillation.
- Task-balanced distillation for object detection
- Ruining Tang and Zhenyu Liu and Yangguang Li and Yiguo Song and Hui Liu and Qide Wang and Jing Shao and Guifang Duan and Jianrong Tan
FKD. ICLR 2021. [OpenReview] <GitHub> - Attention-guided distillation and non-local distillation.
- Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors
- Chunting Zhou and Graham Neubig and Jiatao Gu
FKD. TPAMI. [IEEE Xplore] - A structured knowledge distillation scheme, including attention-guided distillation and non-local distillation.
- Structured Knowledge Distillation for Accurate and Efficient Object Detection
- Linfeng Zhang and Kaisheng Ma
FGD. CVPR 2022. [CVF] [IEEE Xplore] [arXiv] <GitHub> - Focal distillation separates the fore-ground and background, while global distillation rebuilds the relation between different pixels and transfers it from teachers to students.
- Focal and Global Knowledge Distillation for Detectors
- Zhendong Yang and Zhe Li and Xiaohu Jiang and Yuan Gong and Zehuan Yuan and Danpei Zhao and Chun Yuan
GLAMD. ECCV 2022. [ECVA] [Springer] - Divide the feature maps into several patches and apply an attention mechanism for both the entire feature area and each patch.
- GLAMD: Global and Local Attention Mask Distillation for Object Detectors
- Younho Jang and Wheemyung Shin and Jinbeom Kim and Simon Woo and Sung-Ho Bae
CD. ICCV 2021. [CVF] [IEEE Xplore] [arXiv] <GitHub> - Normalize the activation map of each channel to obtain a soft probability map.
- Channel-wise Knowledge Distillation for Dense Prediction
- Changyong Shu and Yifan Liu and Jianfei Gao and Zheng Yan and Chunhua Shen
DRKD. IJCAI 2023. [arXiv] - Dual relation knowledge distillation, including pixel-wise relation distillation and instance-wise relation distillation
- Dual Relation Knowledge Distillation for Object Detection
- Zhenliang Ni and Fukui Yang and Shengzhao Wen and Gang Zhang
GID. CVPR 2021. [CVF] [IEEE Xplore] [arXiv] - A novel distillation method for detection tasks based on discriminative instances without considering the positive or negative distinguished by GT.
- General Instance Distillation for Object Detection
- Xing Dai and Zeren Jiang and Zhao Wu and Yiping Bao and Zhicheng Wang and Si Liu and Erjin Zhou
DSIG. ICCV 2021. [CVF] [IEEE Xplore] [arXiv] <GitHub> - A simple knowledge structure to exploit and encode information inside the detection system to facilitate detector knowledge distillation.
- Deep Structured Instance Graph for Distilling Object Detectors
- Yixin Chen and Pengguang Chen and Shu Liu and Liwei Wang and Jiaya Jia
ICD. NeurIPS 2021. [NeurIPS] [OpenReview] [arXiv] <GitHub> - An instance-conditional distillation framework to find desired knowledge.
- Instance-Conditional Knowledge Distillation for Object Detection
- Zijian Kang and Peizhen Zhang and Xiangyu Zhang and Jian Sun and Nanning Zheng
LAD. WACV 2022. [CVF] [IEEE Xplore] [arXiv] <MMDet> - Use the teacher network to generate labels for the student, through the hard labels dynamically assigned by the teacher.
- Improving Object Detection by Label Assignment Distillation
- Chuong H. Nguyen and Thuy C. Nguyen and Tuan N. Tang and Nam L. H. Phan
TADF. [arXiv] - A general distillation framework that adaptively transfers knowledge from teacher to student according to the task specific prior.
- Distilling Object Detectors with Task Adaptive Regularization
- Ruoyu Sun and Fuhui Tang and Xiaopeng Zhang and Hongkai Xiong and Qi Tian
BCKD. ICCV 2023 [CVF] [IEEE Xplore] [arXiv] - A novel distillation method with cross-task consistent protocols, tailored for the dense object detection.
- Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
- Longrong Yang and Xianpan Zhou and Xuewei Li and Liang Qiao and Zheyang Li and Ziwei Yang and Gaoang Wang and Xi Li
AAAI 2022. [AAAI] [arXiv] - RM takes the rank of candidate boxes from teachers as a new form of knowledge to distill. PFI attempts to correlate feature differences with prediction differences.
- Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-Guided Feature Imitation
- Gang Li and Xiang Li and Yujie Wang and Shanshan Zhang and Yichao Wu and Ding Liang
NeurIPS 2022. [OpenReview] [arXiv] <GitHub> - By taking into account additional contrast and structural cues, feature importance, correlation, and spatial dependence in the feature space are considered in the loss formulation.
- Structural Knowledge Distillation for Object Detection
- Philip De Rijk and Lukas Schneider and Marius Cordts and Dariu M Gavrila
CrossKD. [arXiv] <GitHub> - Delivers the intermediate features of the student's detection head to the teacher's detection head
- CrossKD: Cross-Head Knowledge Distillation for Dense Object Detection
- Jiabao Wang and Yuming Chen and Zhaohui Zheng and Xiang Li and Ming-Ming Cheng and Qibin Hou
LD. CVPR 2022. [CVF] [IEEE Xplore] [arXiv] <GitHub> <MMDet> - Standard KD by adopting the general localization representation of bounding box.
- Localization Distillation for Dense Object Detection
- Zhaohui Zheng and Rongguang Ye and Ping Wang and Jun Wang and Dongwei Ren and Wangmeng Zuo
DETRDistill. ICCV 2023. [CVF] [arXiv] - A novel knowledge distillation dedicated to DETR-families.
- DETRDistill: A Universal Knowledge Distillation Framework for DETR-families
- Jiahao Chang and Shuo Wang and Guangkai Xu and Zehui Chen and Chenhongyi Yang and Feng Zhao
D^3^ETR. [arXiv] - Distills knowledge in decoder predictions and attention maps from the teachers to students.
- D^3^ETR: Decoder Distillation for Detection Transformer
- Xiaokang Chen and Jiahui Chen and Yan Liu and Gang Zeng
KD-DETR. [arXiv] - A general knowledge distillation paradigm for DETR with consistent distillation points sampling.
- Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
- Yu Wang and Xin Li and Shengzhao Wen and Fukui Yang and Wanping Zhang and Gang Zhang and Haocheng Feng and Junyu Han and Errui Ding
G-DetKD. ICCV 2021. [CVF] [IEEE Xplore] [arXiv] - A novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels to provide the optimal guidance to the student.
- G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
- Lewei Yao and Renjie Pi and Hang Xu and Wei Zhang and Zhenguo Li and Tong Zhang
HEAD. ECCV 2022. [ECVA] [Springer] [arXiv] <GitHub> - HEtero-Assists Distillation leveraging heterogeneous detection heads as assistants to guide the optimization of the student detector.
- HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
- Luting Wang and Xiaojie Li and Yue Liao and Zeren Jiang and Jianlong Wu and Fei Wang and Chen Qian and Si Liu
PKD. NeurIPS 2022. [OpenReview] [arXiv] <GitHub> - Imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher and relax constraints on the magnitude of the features.
- PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient
- Weihan Cao and Yifan Zhang and Jianfei Gao and Anda Cheng and Ke Cheng and Jian Cheng
MimicDet. ECCV 2020. [ECVA] [Springer] [arXiv] - A novel and efficient framework to train a one-stage detector by directly mimic the two-stage features, aiming to bridge the accuracy gap between one-stage and two-stage detector.
- MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection
- Xin Lu and Quanquan Li and Buyu Li and Junjie Yan
LabelEnc. ECCV 2020. [ECVA] [Springer] [arXiv] <GitHub> - A new intermediate supervision method to boost the training of object detection systems.
- LabelEnc: A New Intermediate Supervision Method for Object Detection
- Miao Hao and Yitao Liu and Xiangyu Zhang and Jian Sun
HEAD. ECCV 2022. [ECVA] [Springer] [arXiv] <GitHub> - HEtero-Assists Distillation leveraging heterogeneous detection heads as assistants to guide the optimization of the student detector.
- HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
- Luting Wang and Xiaojie Li and Yue Liao and Zeren Jiang and Jianlong Wu and Fei Wang and Chen Qian and Si Liu
LGD. AAAI 2022. [AAAI] [arXiv] - The first self-distillation framework for general object detection.
- LGD: Label-Guided Self-Distillation for Object Detection
- Peizhen Zhang and Zijian Kang and Tong Yang and Xiangyu Zhang and Nanning Zheng and Jian Sun
SSD-Det. ICCV 2023. [CVF] [IEEE Xplore] [arXiv] - Mine spatial information to refine the inaccurate box in a self-distillation fashion.
- Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes
- Di Wu and Pengfei Chen and Xuehui Yu and Guorong Li and Zhenjun Han and Jianbin Jiao
TPAMI. [IEEE Xplore] - A comprehensive survey of KD-based OD models.
- When Object Detection Meets Knowledge Distillation: A Survey
- Zhihui Li and Pengfei Xu and Xiaojun Chang and Luyao Yang and Yuanyuan Zhang and Lina Yao and Xiaojiang Chen
ScaleKD. CVPR 2023. [CVF] - Consists of a Scale-Decoupled Feature distillation module and a Cross-Scale Assistant.
- ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector
- Yichen Zhu and Qiqi Zhou and Ning Liu and Zhiyuan Xu and Zhicai Ou and Xiaofeng Mou and Jian Tang