Reproducing CIFAR-10 results
Closed this issue · 4 comments
Hey @ildoonet. Thanks for your clarification on theconf. I've managed to run Wideresnet28_10 on CIFAR10 so far but our results don't match. I got:
"loss_train": 0.7089337987777514,
"loss_valid": 0.0,
"loss_test": 0.10971159720420838,
"top1_train": 0.7357572115384615,
"top1_valid": 0.0,
"top1_test": 0.9627,
"top5_train": 0.8983573717948717,
"top5_valid": 0.0,
"top5_test": 0.9992,
"epoch": 200
Python version: 3.6.9
Cuda version: 10.0
Pytorch version: 1.3.1
What could be the issue?
I've just noticed that the # of workers was set to 0 for the dataloaders. I've reran the experiments with 32 workers; will give an update once it's completed.
EDIT:
new results still don't match:
"loss_train": 0.7067553847264021,
"loss_valid": 0.0,
"loss_test": 0.10473138356208801,
"top1_train": 0.7372195512820513,
"top1_valid": 0.0,
"top1_test": 0.965,
"top5_train": 0.8991786858974359,
"top5_valid": 0.0,
"top5_test": 0.9992,
"epoch": 200
Hey @ildoonet
For Issue #9 , I modified some codes so currently, it might not reproduce the paper's result correctly.
I'm working on reproducing ImageNet result as well as CIFAR.
I see, no worries. I can see that you've slightly modified the augmentations. Specifically, translateX, translateY, and Solarize. I can change those back. Also, you added a new augment_list which uses different augmentations. I can comment the old augment_list back in. Is there something else?
On another note, you've tried increasing the batch size for wideresnets and shakeshake. You doubled the batch size (128->256), and introduced warmup so that the reported learning rate from the paper is doubled but only by the 5th epoch. Have you also experimented with the initial batch size (128)?
I am aware that this approach of increasing the batch size and learning rate works with ImageNet (i.e. https://arxiv.org/abs/1706.02677), I just haven't seen it with smaller datasets.
I was able to reproduce the CIFAR-10 results to a satisfactory level (for wresnet28_10 97.2%, and for pyramidNet 98.45%). For anyone wanting to replicate them, here's how I did:
(A) I defined the augmentations using the file that was used by the authors (can be found here https://github.com/tensorflow/models/blob/master/research/autoaugment/augmentation_transforms.py), and used the list the paper suggests, i.e.:
random_policy_ops = [
'Identity', 'AutoContrast', 'Equalize', 'Rotate',
'Solarize', 'Color', 'Contrast', 'Brightness',
'Sharpness', 'ShearX', 'TranslateX', 'TranslateY',
'Posterize', 'ShearY'
]
def random_policy(no_layers, magnitude):
sampled_ops = np.random.choice(random_policy_ops, no_layers)
return [(op, 1.0, magnitude) for op in sampled_ops]
(B) In addition, I changed the batch size to 512 for wresnet28_10 (from 128) and to 1024 for pyramidNet (from 64), and increased the learning rate from 0.1 to 0.4 (wresnet28_10) and from 0.05 to 0.8 (pyramidNet) using warmup over the first 5 epochs.
Thanks to Dogus for providing the link to the originally defined augmentations!