/knowledge-distillation

Knowledge Distillation for Skin Lesion Classification

Primary LanguageJupyter Notebook

Knowledge Distillation for Skin Lesion Classification

colab

The goal of knowledge distillation is to improve the performance of the half-witted model, which, most of the time, has fewer parameters, by allowing it to learn from the more competent model or the teacher model. The half-witted model, or the student model, excerpts the knowledge from the teacher model by matching its class distribution to the teacher model's. To make the distributions softer (used in the training process as part of the loss function), we can adjust a temperature T to them (this is done by dividing the logits before softmax by the temperature). This project designates EfficientNet-B0 as the teacher and SqueezeNet v1.1 as the student. These models will be experimented on the DermaMNIST dataset of MedMNIST. We will take a look at the performance of the teacher, the student (without knowledge distillation), and the student (with knowledge distillation) in the result section.

Experiment

To witness the distillation in action, please refer to the notebook at the following link.

Result

Quantitative Result

The quantitative results are delivered below in the form of a table.

Model Loss Accuracy
Teacher 1.935 71.61%
Student 1.932 69.02%
Distilled 1.918 73.44%

Accuracy and Loss Curve

Teacher

teacher_loss_curve
The loss curve on the train set and the validation set of the teacher model.

teacher_acc_curve
The accuracy curve on the train set and the validation set of the teacher model.

Student

student_loss_curve
The loss curve on the train set and the validation set of the student model.

student_acc_curve
The accuracy curve on the train set and the validation set of the student model.

Distilled

distilled_loss_curve
The loss curve on the train set and the validation set of the distilled model.

distilled_acc_curve
The accuracy curve on the train set and the validation set of the distilled model.

Overall Validation Curve

overall_loss
Comparison of loss curves between the teacher model, the student model, and the distilled model on the validation set.

overall_acc
Comparison of accuracy curves between the teacher model, the student model, and the distilled model on the validation set.

Qualitative Result

The qualitative results of the models on the test set are exhibited in the collated form below.

Teacher

teacher_qualitative
The qualitative result of the teacher model.

Student

student_qualitative
The qualitative result of the student model.

Distilled

distilled_qualitative
The qualitative result of the distilled model.

Credit