Why the student model is pretrained for ImageNet for ADI?

Question

Why the student model is pretrained for ImageNet for ADI?

MingSun-Tse opened this issue 4 years ago · 3 comments

Hi, thanks for your great work! I noticed the student network is pretrained for the ADI experiment on ImageNet. This is quite strange since for data-free knowledge distillation, the goal is to train a student with the synthetic samples. If you already have a pretrained student, the problem does not exist from the beginning.

Meanwhile, for the cifar10 experiment, the student is not pretrained, which I think should be the normal setting though. But there is an inconsistency here. Could you explain a little what makes you choose different schemes for cifar10 and ImageNet? Thanks!

Answer 1 · 2020-11-06T18:54:10.000Z

Hi, thanks for looking into details. The scripts provided serve the task of sharing toy examples, and we wanted to show it working so used a different model as a student that was trained to provide some meaningful result. In the original paper, ADI is always used with student that is undergoing the training procedure and is not pertained.

Answer 2 · 2020-11-06T19:08:27.000Z

Hi, great thanks for the update! How about the complete data-free KD code? Some people got questions about the details. When will you release that?

Answer 3 · 2021-08-30T23:55:41.000Z

Hi, thanks for looking into details. The scripts provided serve the task of sharing toy examples, and we wanted to show it working so used a different model as a student that was trained to provide some meaningful result. In the original paper, ADI is always used with student that is undergoing the training procedure and is not pertained.

"ADI is always used with student that is undergoing the training procedure and is not pertained." This approach leads to a huge computational cost. Because every time a batch (256 in total) of data is generated, it goes through 2000 iterations of training. And I now have doubts about whether the paper can be reproduced.