chen742/PiPa

patch loss

Closed this issue · 11 comments

Why your patch loss use the feature of teacher network?And how to train your self.classifier in dacs.py?

I'm also curious about this. The gain of performance may be caused by Teacher model, which is updated by BP (back propagation) but not EMA (expotional moving average). Could you please explain the reason? @chen742

Have you tried patch loss based on feature of student model? @wzlyyb

@super233 Actually, the teacher model has no gradient flow.

def _init_ema_weights(self):
for param in self.get_ema_model().parameters():
param.detach_()

Patch loss may has not been applied in this code.

Init_ema_weight() is only used in the first iteration. After first iteration,the teacher network will be updated by EMA

@super233 Actually, the teacher model has no gradient flow.

def _init_ema_weights(self):
for param in self.get_ema_model().parameters():
param.detach_()

Patch loss may has not been applied in this code.

I agree it

@super233 Actually, the teacher model has no gradient flow.

def _init_ema_weights(self):
for param in self.get_ema_model().parameters():
param.detach_()

Patch loss may has not been applied in this code.

Thanks for your reply. Then why your patch loss use the feature of teacher network?

@super233 Actually, the teacher model has no gradient flow.

def _init_ema_weights(self):
for param in self.get_ema_model().parameters():
param.detach_()

Patch loss may has not been applied in this code.

Thanks for your reply. Then why your patch loss use the feature of teacher network?

Because I am not author. I also wait for the replay of the author.

@super233 Actually, the teacher model has no gradient flow.

def _init_ema_weights(self):
for param in self.get_ema_model().parameters():
param.detach_()

Patch loss may has not been applied in this code.

Thanks for your reply. Then why your patch loss use the feature of teacher network?

Because I am not author. I also wait for the replay of the author.

@chen742

@super233 Actually, the teacher model has no gradient flow.

def _init_ema_weights(self):
for param in self.get_ema_model().parameters():
param.detach_()

Patch loss may has not been applied in this code.

Thanks for your reply. Then why your patch loss use the feature of teacher network?

Because I am not author. I also wait for the replay of the author.

@chen742

Hi @super233 , in our code, we use the teacher branch to perform patch loss for the ablation study, as [zyuanbing] says, there is no gradient flow in the teacher branch you could generate features from the student branch. We will provide the updated code soon

Have you know the author that how to train their self.classifier in dacs.py?

Why your patch loss use the feature of teacher network?And how to train your self.classifier in dacs.py?

请问您知道了吗?
另外,我也没找到作者是如何训练self.cls_head in encoder_decoder.py的。 请问您发现了吗