About the training process
Sharpiless opened this issue · 3 comments
Thanks for your work and code. I have seen that you have given the weight of the teacher model trained on Cifar. Why does Readme write that the teacher should be retrained?
It's just option. You need not to retrained teacher. I just wanted to show that it is available to train the teacher in readme. I'm gonna change the thing what you said.
Thanks for your reply. One more question, the original paper uses cross entropy loss while your code uses BCE loss. Is that the right thing to do?
There are also some implement details which are different from the original paper.
For mnist, the batch in paper is 512 and the lr in generator is 3.0 with 24k samples. The temperature should also be devided by outputs_S instead of outputs_T. Would you like to re-implement this repo?