bro, I use ddp to train CIFAR10, the acc is too low
buaacarzp opened this issue ยท 1 comments
buaacarzp commented
here is my exp:
exp3 is use normal method to train by single gpu, the acc rate is 0.93+, but the exp2 and exp4 is 0.8-, I try to change the lr and batchsize, but it didn't perform well, I need some suggestions.
- I try to set
dist.barrier
after trained an epoch, the results not good. net.eval()
andwith torch.no_grad()
all be used, the results not good.
give you my best wishes, please help me. :)
BIGBALLON commented
- CIFAR10/100 is a toy dataset, 64, 128, 256 batchsize is enough.
dist.barrier
is not the key point, single GPU is enough. 1024 batchsize is too large for CIFAR dataset(only 50K images)- In summary, single GPU for CIFAR10 except the network is too large.
- Don't use DDP everywhere unless really necessary!
- check this repo for more details.
Results on CIFAR
Vanilla architectures
architecture | params | batch size | epoch | C10 test acc (%) | C100 test acc (%) |
---|---|---|---|---|---|
Lecun | 62K | 128 | 250 | 67.46 | 34.10 |
alexnet | 2.4M | 128 | 250 | 75.56 | 38.67 |
vgg19 | 20M | 128 | 250 | 93.00 | 72.07 |
preresnet20 | 0.27M | 128 | 250 | 91.88 | 67.03 |
preresnet110 | 1.7M | 128 | 250 | 94.24 | 72.96 |
preresnet1202 | 19.4M | 128 | 250 | 94.74 | 75.28 |
densenet100bc | 0.76M | 64 | 300 | 95.08 | 77.55 |
densenet190bc | 25.6M | 64 | 300 | 96.11 | 82.59 |
resnext29_16x64d | 68.1M | 128 | 300 | 95.94 | 83.18 |
se_resnext29_16x64d | 68.6M | 128 | 300 | 96.15 | 83.65 |
cbam_resnext29_16x64d | 68.7M | 128 | 300 | 96.27 | 83.62 |
ge_resnext29_16x64d | 70.0M | 128 | 300 | 96.21 | 83.57 |
With additional regularization
PS: the default data augmentation methods are RandomCrop
+ RandomHorizontalFlip
+ Normalize
,
and the โ
means which additional method be used. ๐ฐ
architecture | epoch | cutout | mixup | C10 test acc (%) |
---|---|---|---|---|
preresnet20 | 250 | 91.88 | ||
preresnet20 | 250 | โ | 92.57 | |
preresnet20 | 250 | โ | 92.71 | |
preresnet20 | 250 | โ | โ | 92.66 |
preresnet110 | 250 | 94.24 | ||
preresnet110 | 250 | โ | 94.67 | |
preresnet110 | 250 | โ | 94.94 | |
preresnet110 | 250 | โ | โ | 95.66 |
se_resnext29_16x64d | 300 | 96.15 | ||
se_resnext29_16x64d | 300 | โ | 96.60 | |
se_resnext29_16x64d | 300 | โ | 96.86 | |
se_resnext29_16x64d | 300 | โ | โ | 97.03 |
cbam_resnext29_16x64d | 300 | โ | โ | 97.16 |
ge_resnext29_16x64d | 300 | โ | โ | 97.19 |
-- | -- | -- | -- | -- |
shake_resnet26_2x64d | 1800 | 96.94 | ||
shake_resnet26_2x64d | 1800 | โ | 97.20 | |
shake_resnet26_2x64d | 1800 | โ | 97.42 | |
shake_resnet26_2x64d | 1800 | โ | โ | 97.71 |
PS: shake_resnet26_2x64d
achieved 97.71% test accuracy with cutout
and mixup
!!
It's cool, right?
With different LR scheduler
architecture | epoch | step decay | cosine | htd(-6,3) | cutout | mixup | C10 test acc (%) |
---|---|---|---|---|---|---|---|
preresnet20 | 250 | โ | 91.88 | ||||
preresnet20 | 250 | โ | 92.13 | ||||
preresnet20 | 250 | โ | 92.44 | ||||
preresnet20 | 250 | โ | โ | โ | 93.30 | ||
preresnet110 | 250 | โ | 94.24 | ||||
preresnet110 | 250 | โ | 94.48 | ||||
preresnet110 | 250 | โ | 94.82 | ||||
preresnet110 | 250 | โ | โ | โ | 95.88 |