BIGBALLON/distribuuuu

bro, I use ddp to train CIFAR10, the acc is too low

buaacarzp opened this issue ยท 1 comments

here is my exp:
image
exp3 is use normal method to train by single gpu, the acc rate is 0.93+, but the exp2 and exp4 is 0.8-, I try to change the lr and batchsize, but it didn't perform well, I need some suggestions.

  1. I try to set dist.barrier after trained an epoch, the results not good.
  2. net.eval() and with torch.no_grad() all be used, the results not good.
    give you my best wishes, please help me. :)
  1. CIFAR10/100 is a toy dataset, 64, 128, 256 batchsize is enough.
  2. dist.barrier is not the key point, single GPU is enough. 1024 batchsize is too large for CIFAR dataset(only 50K images)
  3. In summary, single GPU for CIFAR10 except the network is too large.
  4. Don't use DDP everywhere unless really necessary!
  5. check this repo for more details.

Results on CIFAR

Vanilla architectures

architecture params batch size epoch C10 test acc (%) C100 test acc (%)
Lecun 62K 128 250 67.46 34.10
alexnet 2.4M 128 250 75.56 38.67
vgg19 20M 128 250 93.00 72.07
preresnet20 0.27M 128 250 91.88 67.03
preresnet110 1.7M 128 250 94.24 72.96
preresnet1202 19.4M 128 250 94.74 75.28
densenet100bc 0.76M 64 300 95.08 77.55
densenet190bc 25.6M 64 300 96.11 82.59
resnext29_16x64d 68.1M 128 300 95.94 83.18
se_resnext29_16x64d 68.6M 128 300 96.15 83.65
cbam_resnext29_16x64d 68.7M 128 300 96.27 83.62
ge_resnext29_16x64d 70.0M 128 300 96.21 83.57

With additional regularization

PS: the default data augmentation methods are RandomCrop + RandomHorizontalFlip + Normalize,
and the โˆš means which additional method be used. ๐Ÿฐ

architecture epoch cutout mixup C10 test acc (%)
preresnet20 250 91.88
preresnet20 250 โˆš 92.57
preresnet20 250 โˆš 92.71
preresnet20 250 โˆš โˆš 92.66
preresnet110 250 94.24
preresnet110 250 โˆš 94.67
preresnet110 250 โˆš 94.94
preresnet110 250 โˆš โˆš 95.66
se_resnext29_16x64d 300 96.15
se_resnext29_16x64d 300 โˆš 96.60
se_resnext29_16x64d 300 โˆš 96.86
se_resnext29_16x64d 300 โˆš โˆš 97.03
cbam_resnext29_16x64d 300 โˆš โˆš 97.16
ge_resnext29_16x64d 300 โˆš โˆš 97.19
-- -- -- -- --
shake_resnet26_2x64d 1800 96.94
shake_resnet26_2x64d 1800 โˆš 97.20
shake_resnet26_2x64d 1800 โˆš 97.42
shake_resnet26_2x64d 1800 โˆš โˆš 97.71

PS: shake_resnet26_2x64d achieved 97.71% test accuracy with cutout and mixup!!
It's cool, right?

With different LR scheduler

architecture epoch step decay cosine htd(-6,3) cutout mixup C10 test acc (%)
preresnet20 250 โˆš 91.88
preresnet20 250 โˆš 92.13
preresnet20 250 โˆš 92.44
preresnet20 250 โˆš โˆš โˆš 93.30
preresnet110 250 โˆš 94.24
preresnet110 250 โˆš 94.48
preresnet110 250 โˆš 94.82
preresnet110 250 โˆš โˆš โˆš 95.88