About the training process

Question

About the training process

Sharpiless opened this issue 3 years ago · 3 comments

Thanks for your work and code. I have seen that you have given the weight of the teacher model trained on Cifar. Why does Readme write that the teacher should be retrained?

Answer 1 · 2021-08-26T04:14:59.000Z

It's just option. You need not to retrained teacher. I just wanted to show that it is available to train the teacher in readme. I'm gonna change the thing what you said.

Answer 2 · 2021-08-26T08:46:50.000Z

Thanks for your reply. One more question, the original paper uses cross entropy loss while your code uses BCE loss. Is that the right thing to do?

Answer 3 · 2021-08-26T09:06:11.000Z

There are also some implement details which are different from the original paper.

For mnist, the batch in paper is 512 and the lr in generator is 3.0 with 24k samples. The temperature should also be devided by outputs_S instead of outputs_T. Would you like to re-implement this repo?