reshalfahsi/knowledge-distillation

Knowledge Distillation for Skin Lesion Classification

Jupyter Notebook

Knowledge Distillation for Skin Lesion Classification

colab

The goal of knowledge distillation is to improve the performance of the half-witted model, which, most of the time, has fewer parameters, by allowing it to learn from the more competent model or the teacher model. The half-witted model, or the student model, excerpts the knowledge from the teacher model by matching its class distribution to the teacher model's. To make the distributions softer (used in the training process as part of the loss function), we can adjust a temperature T to them (this is done by dividing the logits before softmax by the temperature). This project designates EfficientNet-B0 as the teacher and SqueezeNet v1.1 as the student. These models will be experimented on the DermaMNIST dataset of MedMNIST. We will take a look at the performance of the teacher, the student (without knowledge distillation), and the student (with knowledge distillation) in the result section.

Experiment

To witness the distillation in action, please refer to the notebook at the following link.

Result

Quantitative Result

The quantitative results are delivered below in the form of a table.

Model	Loss	Accuracy
Teacher	1.935	71.61%
Student	1.932	69.02%
Distilled	1.918	73.44%

Accuracy and Loss Curve

Teacher

The loss curve on the train set and the validation set of the teacher model.

The accuracy curve on the train set and the validation set of the teacher model.

Student

The loss curve on the train set and the validation set of the student model.

The accuracy curve on the train set and the validation set of the student model.

Distilled

The loss curve on the train set and the validation set of the distilled model.

The accuracy curve on the train set and the validation set of the distilled model.

Overall Validation Curve

Comparison of loss curves between the teacher model, the student model, and the distilled model on the validation set.

Comparison of accuracy curves between the teacher model, the student model, and the distilled model on the validation set.

Qualitative Result

The qualitative results of the models on the test set are exhibited in the collated form below.

Teacher

The qualitative result of the teacher model.

Student

The qualitative result of the student model.

Distilled

The qualitative result of the distilled model.

Credit