Ariande1/MS-ResNet

sorry,I can't use the structure shown in the paper to get the same result on cifar10 dataset

Closed this issue · 7 comments

sorry,I can't use the structure shown in the paper to get the same result on cifar10 dataset

Could you describe your issue in more detail? Is there a specific structure that hasn’t achieved the accuracy reported in the paper?

Sure, I choose the structure shown in tabel Ⅱ ,I think they are lightweight model from standard Resnet. But you don't put them on Github's code repository so I reproduced them by reducing the number of channels in your standard ResNet model. And I wanna confirm one thing about the experiment of CIFAR10 datas.。In tabel Ⅷ,you just use randomcrop and normalize for tabel Ⅱ model. But my experimental results differ from yours by approximately 5 percentage points. Thank you for answering my questions.

Oh, I noticed that the table you referenced regarding hyperparameters should come from the version we uploaded to arXiv. Strictly following the hyperparameters in that version will indeed result in an accuracy of 85-86% for SNN-ResNet20 in my reproduction as well.

The transformations used for Table II are RandomCrop, RandomHorizontalFlip, and normalization, which has been modified in our TNNLS version. Additionally, I recommend setting the weight decay to 1e-4. This should lead to more satisfactory results. Apologies for any inconvenience caused.

Thank you for the provided explanation. However, I am still unable to achieve 85% accuracy using the previous hyperparameters. Could you please provide the model parameters for ResNet20 that you used, or share the relevant code?

Sure. This is the relevant training code and pre-trained weights of Resnet20, with an accuracy of 88.38%. link

Great!Thank you very much for your assistance. My experiment indeed achieved considerable accuracy. I have one more question regarding the Batch Normalization layer settings. What is the reason for initializing BatchNorm3d2 to 0.2*thresh?

Initializing this affine parameter close to zero will give the model a clean and branch-less starting point (only shortcut) and thus provide a faster convergence. You may refer to the SectionV.B of our paper for more details.