qinenergy/cotta

Question about the student model

Closed this issue · 5 comments

Aikoin commented

Dear author, thanks for your sharing! I am currently reading your paper and I have a question about the student model's initialization. Specifically, I am wondering how the student model is initialized in your experiments. (i can't understand the code well and have difficulty finding this part TT)
If you have time, I would greatly appreciate it if you could provide some clarification on this point.
Thank you for your time and consideration^^

Hi,
the student model is initialized using the source pre-trained model provided by Robustbench. You can find the load_model call here:

base_model = load_model(cfg.MODEL.ARCH, cfg.CKPT_DIR,

Aikoin commented

Thank you very much for your prompt reply! Your explanation was very helpful.
I have another question: If both student and teacher models are initialized with the same source pre-trained model, do they have any differences at time step 0? If not, how are the parameters of the student model updated since they make the same predictions and the cross-entropy loss becomes 0?
Thank you again for your help, and wish you a happy March^^

For us, this is not the case. Because we use the averaged augmented prediction from the teacher model, but a single prediction from student mode.

Note that even if we don't consider the augmentation difference, the cross-entropy loss will not output 0, because the network will not give confidence level of 1 for a specific class. If you still have doubt, please check wikipedia to see how softmax and cross-entropy work.

Aikoin commented

For us, this is not the case. Because we use the averaged augmented prediction from the teacher model, but a single prediction from student mode.

Note that even if we don't consider the augmentation difference, the cross-entropy loss will not output 0, because the network will not give confidence level of 1 for a specific class. If you still have doubt, please check wikipedia to see how softmax and cross-entropy work.

Thank you so much for taking the time to answer my questions and clarify my confusion. I realize now that I actually confused cross-entropy and KL-divergence and asked some silly questions, but you were patient and kind in your response. Your expertise and willingness to help are truly appreciated. : )

Thanks for your interest in our work. Good luck on your research.