Reproducing Imagenet Knowledge Transfer Top-1 Accuracy

Question

Reproducing Imagenet Knowledge Transfer Top-1 Accuracy

amiller195 opened this issue 4 years ago · 3 comments

Hi,
Very interesting work!
According to Table 6 in the paper, training for 90 epochs with the 140K generated dataset should reach top-1 accuracy of 68.0%.
I'm trying to train Resnet50v1.5 based on the protocol here https://github.com/NVIDIA/DeepLearningExamples with the 140k dataset, can't pass top-1 accuracy of 10%.

Can you please elaborate on the training process using the generated 140k images? What protocol or additional work was required to reach the mentioned accuracy?

Thanks!

Answer 1 · 2021-01-28T19:16:05.000Z

Using KL divergence instead of CE, and rescaling KL divergence into normal loss ranges - distillation setup details in Sec 4.4.

Answer 2 · 2021-02-20T23:44:00.000Z

Hi, thank you for the great work!

Sorry I also have the same question as above and wonder if the question is resolved.

I couldn't reproduce the accuracy on Imagenet with the 140k images provided. I only can reach over 30% top-1 accuracy as followed in Sec 4.4 from the paper. My training setups include: batch size 256, temperature 3, KL loss only (only relies on teacher logits), 250 epochs, learning rate 1.0 and SGD with a decay step of every 80 epochs.

Many thanks!

Answer 3 · 2023-03-31T07:58:34.000Z

same question