YanjingLi0202/Q-ViT

Where is Distilled Guided Distillation(DGD)?

Opened this issue a year ago · 1 comments

pdh930105 commented a year ago

I think that DistillationLoss required distilled guided distillation (or every block's q, k pair).

But, i can't find DGD function.

Can this code show the performance of the paper without the DGD function?

XA23i commented a year ago

it's the same with BiBERT[1]. In fact, I think the whole paper is quite similar to BiBERT and IR-Net[2]. just bring it to model quantization.😬

[1] https://github.com/htqin/BiBERT/blob/91fd347eefc490a87275e66be68bfceb27837aee/transformer/modeling_quant.py#L155
[2] https://openaccess.thecvf.com/content_CVPR_2020/papers/Qin_Forward_and_Backward_Information_Retention_for_Accurate_Binary_Neural_Networks_CVPR_2020_paper.pdf