YanjingLi0202/Q-ViT

Where is Distilled Guided Distillation(DGD)?

Opened this issue · 1 comments

I think that DistillationLoss required distilled guided distillation (or every block's q, k pair).

But, i can't find DGD function.

Can this code show the performance of the paper without the DGD function?