What's the baseline resnet50 model used in spare_learning experiments?
yuanyuanli85 opened this issue · 5 comments
Thank you for pointing this out, this is a common question that I get!
I misreported multi-crop top1/top5 for the baseline and this will be corrected in the new version of the paper (you can expect it on Wednesday). The scores in this table come from a ResNet-50 baseline with cosine learning rate schedule with warmup and label smoothing but fully sparse layers which yield 77.0% accuracy for 100% weights. You can find the code for this in the fully sparse subfolder. I also replicated on the dynamic sparse codebase with dense first convolution and downsample convolutional layers (as done by dynamic sparse) which can be found in the partially dense subfolder. This ResNet-50 has a baseline performance of 74.9% accuracy with 100% weights. In the latter case, sparse momentum still retains state-of-the-art performance with 72.4 and 74.2 Top-1 for 10% and 20% weights. Does this answer your question?
Let me know if something is still unclear or if you have more questions.
Thank you for your quick response and clarification. Looking forward to the new version of the paper.
Dear,Tim, the paper on arxiv still the same precision, where is the new version of paper?
Will be up in 2-3 days. I needed to change a couple of things.
I just submitted the paper to arXiv. It will be released on Monday.