question
Slawlight opened this issue · 4 comments
When i use my own resnet56 model, it got a very poor performance at the start Tock step.
I use SGD to train my model with lr, moment , weight decay set to 0.1, 0.9, 1e-4. The learning rate is divide by 0.1 at epoch 80 and 120. The training setting is same as the one you mentioned in your paper.
During training process, when the lr is decayed to 0.01, the model can get at least 92.8% test acc. But your first tock step shows that the acc is lower than 91% when the lr is 0.01, I think it is impossible.
Apologize for the late response, been busy these days. We didn't specifically describe our weight decay in our paper due to the page limitation. Our experiment on ResNet-56 is actually set the weight decay to 5e-4, including training the baseline model. Our baseline is trained with the following configuration, which is presented in run/resnet-56.ipynb
. We will upload the baseline model training notebook later.
{
"base": {
"task_name": "resnet56_cifar10_ticktock",
"cuda": True,
"seed": 0,
"checkpoint_path": "",
"model_saving_interval": 160,
"epoch": 0,
"multi_gpus": True,
"fp16": False
},
"model": {
"name": "cifar.resnet56",
"num_class": 10,
"pretrained": False
},
"train": {
"trainer": "normal",
"max_epoch": 160,
"optim": "sgd",
"steplr": [
[80, 0.1],
[120, 0.01],
[160, 0.001]
],
"weight_decay": 5e-4,
"momentum": 0.9,
"nesterov": False
},
"data": {
"type": "cifar10",
"shuffle": True,
"batch_size": 128,
"test_batch_size": 128,
"num_workers": 4
},
"loss": {
"criterion": "softmax"
}
}
So what about the weight decay setting for training Resnet-50 on ImageNet?
Default setting from:
https://github.com/pytorch/examples/blob/master/imagenet/main.py
Thanks a lot