youzhonghui/gate-decorator-pruning

question

Slawlight opened this issue · 4 comments

When i use my own resnet56 model, it got a very poor performance at the start Tock step.
I use SGD to train my model with lr, moment , weight decay set to 0.1, 0.9, 1e-4. The learning rate is divide by 0.1 at epoch 80 and 120. The training setting is same as the one you mentioned in your paper.
屏幕快照 2019-10-07 下午3 37 02
屏幕快照 2019-10-07 下午3 36 35
During training process, when the lr is decayed to 0.01, the model can get at least 92.8% test acc. But your first tock step shows that the acc is lower than 91% when the lr is 0.01, I think it is impossible.

Apologize for the late response, been busy these days. We didn't specifically describe our weight decay in our paper due to the page limitation. Our experiment on ResNet-56 is actually set the weight decay to 5e-4, including training the baseline model. Our baseline is trained with the following configuration, which is presented in run/resnet-56.ipynb. We will upload the baseline model training notebook later.

{
    "base": {
        "task_name": "resnet56_cifar10_ticktock",
        "cuda": True,
        "seed": 0,
        "checkpoint_path": "",
        "model_saving_interval": 160,
        "epoch": 0,
        "multi_gpus": True,
        "fp16": False
    },
    "model": {
        "name": "cifar.resnet56",
        "num_class": 10,
        "pretrained": False
    },
    "train": {
        "trainer": "normal",
        "max_epoch": 160,
        "optim": "sgd",
        "steplr": [
            [80, 0.1],
            [120, 0.01],
            [160, 0.001]
        ],
        "weight_decay": 5e-4,
        "momentum": 0.9,
        "nesterov": False
    },
    "data": {
        "type": "cifar10",
        "shuffle": True,
        "batch_size": 128,
        "test_batch_size": 128,
        "num_workers": 4
    },
    "loss": {
        "criterion": "softmax"
    }
}

So what about the weight decay setting for training Resnet-50 on ImageNet?

Thanks a lot