sseung0703/KD_methods_with_TF

probably mistaken implementation of RKD methond

wisdom0530 opened this issue · 3 comments

Hi!
I am very grateful for your code. It helps me a lot.
I have 2 questions:
(1) In the paper of RKD method, the authors say: "RKD-D and RKD-A are applied
on the last pooling layer of the teacher and the student" in the Image classification section. However, in the code you provide, the RKD method is applied to "logits". I think it maybe a mistake.
(2) what's the Tensorflow version used in these codes. could you please add a "requirement" file to this project?

Thank you for your comments.
(1) I checked paper again and I found what you say. You're right it is my mistake. I just confused due to this line : "We apply RKD-D and RKD-A on the final embedding outputs of the teacher and the student." in page 5. I'll correct this error and update the experiment results.
(2) I use TF.1.13 for all the experiments. Your suggestion is useful for my repo. I'll submit a "requirement "soon

I checked the author's repository again, found that they used logits also.
I don't know which one is right, but implementation is more believable.

I checked the paper carefully, and I found that

"RKD-D and RKD-A are applied on the last pooling layer of the teacher and the student, as they produce the final embedding before classification."
and
"As the prototypical networks build on shallow networks that consist of only 4 convolutional layers, we use the same architecture for the student model and the teacher, i.e., self-distillation, rather than using a smaller student network. We apply RKD, FitNet, and Attention on the final embedding output of the teacher and the student."

It is very interesting. Because only difference is number of data, but it makes the authors change inputs of their algorithm.