ShuffleNetV2+ cannot be convergent when setting shuffle=False of train_loader
songkq opened this issue · 4 comments
Paramaters:
model-size=Large, auto_continue=False, batch-size=128, num_workers=8,and other default params.
Evironment:
Ubuntu 16.04, PyTorch1.2, Single RTX 2080ti GPU
When I train the ShuffleNetV2+ on ImageNet-1K dataset, it cannot be convergent when setting shuffle=False
of train_loader
. Can you give some kind advice?
[30 02:15:34] TRAIN Iter 20: lr = 0.499978, loss = 2.621804, Top-1 err = 0.162891, Top-5 err = 0.145703, data_time = 0.006651, train_time = 2.031535
[30 02:16:09] TRAIN Iter 40: lr = 0.499956, loss = 4.062627, Top-1 err = 0.154688, Top-5 err = 0.096875, data_time = 0.006521, train_time = 1.751516
[30 02:16:41] TRAIN Iter 60: lr = 0.499933, loss = 48.428082, Top-1 err = 0.465625, Top-5 err = 0.323047, data_time = 0.006557, train_time = 1.589813
[30 02:16:47] TRAIN Iter 80: lr = 0.499911, loss = nan, Top-1 err = 0.628516, Top-5 err = 0.564063, data_time = 0.006574, train_time = 0.323313
[30 02:16:52] TRAIN Iter 100: lr = 0.499889, loss = nan, Top-1 err = 1.000000, Top-5 err = 1.000000, data_time = 0.006572, train_time = 0.255579
Hi @ipScore ,
From the Top-1 err it seems you are fine tuning, if you modify the model please train from scratch.
Hi @nmaac ,
Not really, I just train from scratch. When I set shuffle=True of train_loader
, everything is OK. However, it fails to be convergent with shuffle=False of train_loader
. It seems a little strange.
I misunderstood the "shuffle" you mentioned. I was referring to the "channel shuffle operator" in the block.
However, if you are referring to the "shuffle of training data", it is a standard configuration in data augmentation since the training samples are ranked according to their classes. We do not suggest you change it to be False.