YyzHarry/ME-Net

Could you provide hyperparameter for MNIST?

Magallan1229 opened this issue · 6 comments

Im trying to reproduce this work but always cant get the result what the paper shows. For example, clean images should has top1-acc 96.8 on ME-Net(p:0.2-0.4) but what I get is only 92.2. As the configs are not clear so I guess this is caused by my improper implementation, could you provide the config informations for MNIST? Here are mine if it helps.

augment:True
batchsize:200
optimizer: Adam(model.parameters(),lr=0.0001)
svdprob :0.8
startp: 0.2
endp :0.4
epoch: 100
mask-num: 10
me-type :usvt
model:
self.conv1 = nn.Conv2d(1,32,5,padding=2)
self.conv2 = nn.Conv2d(32,64,5,padding=2)
self.fc1 = nn.Linear(6477, 1024)
self.fc2 = nn.Linear(1024, 10)

Hi, thanks for your interest in our work.

For training hyper-parameters on each dataset, you can actually find all relevant parameters in Appendix B, Table 8, in our paper. In short, for MNIST, we use SGD w/ momentum 0.9, LR=0.01, and LR decay on Epoch 100/150.

Thank you for your reply , but the lenet model your provide need input with size of 33232 , which seems to be consistent with CIFAR10 not MNIST (12828). Do I need to resize the input or modify the model to match?

Oh good question. I just checked the implementation on MNIST, and found that the LeNet model I used is actually the same as you used (I just made a quick check and seems this is the standard one used in literature). So no worries about my previous comment on architecture. :)

And should I normalize images into [0,1] , or use mean and std to get transforms of inputs as (inputs-mean)/std ?

In the paper you said"We then randomly flip the images horizontally and normalize them into [0, 1]", but code in train_pure.py has transforms composed of ToTensor() and Normalize() which results in a larger range of inputs than 0~1. Forgive my so many questions, I have to make things clear to avoid wasting days of time because my device always spend too much time to run cnn code. Thank you again for your reply.

For the pure training of ME-Net, whether you do the normalization or not shouldn't affect the results much. So either way is fine.