bro, I use ddp to train CIFAR10, the acc is too low

Question

bro, I use ddp to train CIFAR10, the acc is too low

buaacarzp opened this issue 2 years ago · 1 comments

here is my exp:

exp3 is use normal method to train by single gpu, the acc rate is 0.93+, but the exp2 and exp4 is 0.8-, I try to change the lr and batchsize, but it didn't perform well, I need some suggestions.

I try to set dist.barrier after trained an epoch, the results not good.
net.eval() and with torch.no_grad() all be used, the results not good.
give you my best wishes, please help me. :)

Answer 1 · 2022-07-02T13:16:53.000Z

CIFAR10/100 is a toy dataset, 64, 128, 256 batchsize is enough.
dist.barrier is not the key point, single GPU is enough. 1024 batchsize is too large for CIFAR dataset(only 50K images)
In summary, single GPU for CIFAR10 except the network is too large.
Don't use DDP everywhere unless really necessary!
check this repo for more details.

Results on CIFAR

Vanilla architectures

architecture	params	batch size	epoch	C10 test acc (%)	C100 test acc (%)
Lecun	62K	128	250	67.46	34.10
alexnet	2.4M	128	250	75.56	38.67
vgg19	20M	128	250	93.00	72.07
preresnet20	0.27M	128	250	91.88	67.03
preresnet110	1.7M	128	250	94.24	72.96
preresnet1202	19.4M	128	250	94.74	75.28
densenet100bc	0.76M	64	300	95.08	77.55
densenet190bc	25.6M	64	300	96.11	82.59
resnext29_16x64d	68.1M	128	300	95.94	83.18
se_resnext29_16x64d	68.6M	128	300	96.15	83.65
cbam_resnext29_16x64d	68.7M	128	300	96.27	83.62
ge_resnext29_16x64d	70.0M	128	300	96.21	83.57

With additional regularization

PS: the default data augmentation methods are RandomCrop + RandomHorizontalFlip + Normalize,
and the √ means which additional method be used. 🍰

architecture	epoch	cutout	mixup	C10 test acc (%)
preresnet20	250			91.88
preresnet20	250	√		92.57
preresnet20	250		√	92.71
preresnet20	250	√	√	92.66
preresnet110	250			94.24
preresnet110	250	√		94.67
preresnet110	250		√	94.94
preresnet110	250	√	√	95.66
se_resnext29_16x64d	300			96.15
se_resnext29_16x64d	300	√		96.60
se_resnext29_16x64d	300		√	96.86
se_resnext29_16x64d	300	√	√	97.03
cbam_resnext29_16x64d	300	√	√	97.16
ge_resnext29_16x64d	300	√	√	97.19
--	--	--	--	--
shake_resnet26_2x64d	1800			96.94
shake_resnet26_2x64d	1800	√		97.20
shake_resnet26_2x64d	1800		√	97.42
shake_resnet26_2x64d	1800	√	√	97.71

PS: shake_resnet26_2x64d achieved 97.71% test accuracy with cutout and mixup!!
It's cool, right?

With different LR scheduler

architecture	epoch	step decay	cosine	htd(-6,3)	cutout	mixup	C10 test acc (%)
preresnet20	250	√					91.88
preresnet20	250		√				92.13
preresnet20	250			√			92.44
preresnet20	250			√	√	√	93.30
preresnet110	250	√					94.24
preresnet110	250		√				94.48
preresnet110	250			√			94.82
preresnet110	250			√	√	√	95.88