[Paper] Regularizing Class-wise Predictions via Self-knowledge Distillation

Question

[Paper] Regularizing Class-wise Predictions via Self-knowledge Distillation

ashwinvaswani opened this issue 4 years ago · 0 comments

Paper: Regularizing Class-wise Predictions via Self-knowledge Distillation
Paper Link: https://arxiv.org/pdf/2003.13964.pdf

Description

Deep neural networks with millions of parameters may suffer from poor generalization due to overfitting. To mitigate the issue, the authors propose a new regularization method that penalizes the predictive distribution between similar samples. In particular, they distill the predictive distribution between different samples of the same label during training. This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network (i.e., a self-knowledge distillation) by forcing it to produce more meaningful and consistent predictions in a class-wise manner. Consequently, it mitigates overconfident predictions
and reduces intra-class variations. Experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve not only the generalization ability but also the calibration performance of modern convolutional neural networks.