irfanICMLL/TorchDistiller

KLD in Object Detection

Opened this issue · 0 comments

Hi, irfanlCMLL.

Thanks for kindly sharing this repository.
In my training phase, I checked that the CWD losses (cwd_0, ..., cwd_3) didn't decrease even they have very high values.
I assume that it can be very hard to decrease the KL divergence loss.
(I did experiments on this repo: https://github.com/pppppM/mmdetection-distiller)

Do you agree with my assumption? How about other losses (L2, L1, Cosine similarity) for CWD?