Could you provide hyperparameter for MNIST?

Question

Could you provide hyperparameter for MNIST?

Magallan1229 opened this issue 4 years ago · 6 comments

Im trying to reproduce this work but always cant get the result what the paper shows. For example, clean images should has top1-acc 96.8 on ME-Net(p:0.2-0.4) but what I get is only 92.2. As the configs are not clear so I guess this is caused by my improper implementation, could you provide the config informations for MNIST? Here are mine if it helps.

augment:True
batchsize:200
optimizer: Adam(model.parameters(),lr=0.0001)
svdprob :0.8
startp: 0.2
endp :0.4
epoch: 100
mask-num: 10
me-type :usvt
model:
self.conv1 = nn.Conv2d(1,32,5,padding=2)
self.conv2 = nn.Conv2d(32,64,5,padding=2)
self.fc1 = nn.Linear(6477, 1024)
self.fc2 = nn.Linear(1024, 10)

Answer 1 · 2020-07-06T20:25:25.000Z

Hi, thanks for your interest in our work.

For training hyper-parameters on each dataset, you can actually find all relevant parameters in Appendix B, Table 8, in our paper. In short, for MNIST, we use SGD w/ momentum 0.9, LR=0.01, and LR decay on Epoch 100/150.

Answer 2 · 2020-07-07T04:06:32.000Z

Thank you for your reply , but the lenet model your provide need input with size of 33232 , which seems to be consistent with CIFAR10 not MNIST (12828). Do I need to resize the input or modify the model to match?

Answer 3 · 2020-07-07T04:56:11.000Z

Oh good question. I just checked the implementation on MNIST, and found that the LeNet model I used is actually the same as you used (I just made a quick check and seems this is the standard one used in literature). So no worries about my previous comment on architecture. :)

Answer 4 · 2020-07-07T07:43:12.000Z

And should I normalize images into [0,1] , or use mean and std to get transforms of inputs as (inputs-mean)/std ?

Answer 5 · 2020-07-07T07:53:54.000Z

In the paper you said"We then randomly flip the images horizontally and normalize them into [0, 1]", but code in train_pure.py has transforms composed of ToTensor() and Normalize() which results in a larger range of inputs than 0~1. Forgive my so many questions, I have to make things clear to avoid wasting days of time because my device always spend too much time to run cnn code. Thank you again for your reply.

Answer 6 · 2020-07-07T16:12:40.000Z

For the pure training of ME-Net, whether you do the normalization or not shouldn't affect the results much. So either way is fine.