
Reproducing Imagenet Knowledge Transfer Top-1 Accuracy

amiller195 opened this issue · 3 comments

Very interesting work!
According to Table 6 in the paper, training for 90 epochs with the 140K generated dataset should reach top-1 accuracy of 68.0%.
I'm trying to train Resnet50v1.5 based on the protocol here https://github.com/NVIDIA/DeepLearningExamples with the 140k dataset, can't pass top-1 accuracy of 10%.

Can you please elaborate on the training process using the generated 140k images? What protocol or additional work was required to reach the mentioned accuracy?


Using KL divergence instead of CE, and rescaling KL divergence into normal loss ranges - distillation setup details in Sec 4.4.

Hi, thank you for the great work!

Sorry I also have the same question as above and wonder if the question is resolved.

I couldn't reproduce the accuracy on Imagenet with the 140k images provided. I only can reach over 30% top-1 accuracy as followed in Sec 4.4 from the paper. My training setups include: batch size 256, temperature 3, KL loss only (only relies on teacher logits), 250 epochs, learning rate 1.0 and SGD with a decay step of every 80 epochs.

Many thanks!

same question