TimDettmers/sparse_learning

What's the baseline resnet50 model used in spare_learning experiments?

yuanyuanli85 opened this issue · 5 comments

image
Table3 in paper shows the accuracy of resnet50 and does comparsion with other approches. The proposed method Sparse momentum achevied 74.9% top1 accuracy. Does the baseline model have 79.3% top1? Ifso, the sparse model has 4.4% drop compared to baseline dense model.

Thank you for pointing this out, this is a common question that I get!

I misreported multi-crop top1/top5 for the baseline and this will be corrected in the new version of the paper (you can expect it on Wednesday). The scores in this table come from a ResNet-50 baseline with cosine learning rate schedule with warmup and label smoothing but fully sparse layers which yield 77.0% accuracy for 100% weights. You can find the code for this in the fully sparse subfolder. I also replicated on the dynamic sparse codebase with dense first convolution and downsample convolutional layers (as done by dynamic sparse) which can be found in the partially dense subfolder. This ResNet-50 has a baseline performance of 74.9% accuracy with 100% weights. In the latter case, sparse momentum still retains state-of-the-art performance with 72.4 and 74.2 Top-1 for 10% and 20% weights. Does this answer your question?

Let me know if something is still unclear or if you have more questions.

Thank you for your quick response and clarification. Looking forward to the new version of the paper.

Dear,Tim, the paper on arxiv still the same precision, where is the new version of paper?

Will be up in 2-3 days. I needed to change a couple of things.

I just submitted the paper to arXiv. It will be released on Monday.