resnet 164 on cifar10
Closed this issue · 5 comments
I have test the this repository on cifar10 with resnet164.
I have decreased lr at 160, 180, 200 epoch by 10. The begining lr is 0.1.
The best top1 accuracy is 94.77%, which is higher on than those in Kaiming's paper (error rate 5.46%).
I also found an interesting thing, if you train resnet164 with longer epochs at the first lr, the accuracy will be higher than resnet-1001 in Kaiming's paper. However, i haven't save the log. 😢
this maybe useful when training imagenet i hear about someone using mx.lr_scheduler.MultiFactorScheduler(step=[60*epoch_size, 70*epoch_size, 80*epoch_size], factor=0.1),
and this can lead a good result when running longer before first decrease lr.
@austingg how fast is resnet on cifar10? i can borrow your several ec2 g2.8x machine if you want to do more experiments. a g2.8x roughly equals to one k40.
@mli I use two gtx 1080 gpus. With batchsize 128, the speed is about 700 samples/sec.
larger batchsize, higher speed, but larger batchsize may affect the final performance.
INFO:root:Epoch[146] Batch [50] Speed: 740.35 samples/sec Train-accuracy=0.959375
INFO:root:Epoch[146] Batch [100] Speed: 719.88 samples/sec Train-accuracy=0.956250
INFO:root:Epoch[146] Batch [150] Speed: 719.03 samples/sec Train-accuracy=0.955937
INFO:root:Epoch[146] Batch [200] Speed: 720.36 samples/sec Train-accuracy=0.952812
INFO:root:Epoch[146] Batch [250] Speed: 720.53 samples/sec Train-accuracy=0.953281
INFO:root:Epoch[146] Batch [300] Speed: 716.25 samples/sec Train-accuracy=0.957344
INFO:root:Epoch[146] Batch [350] Speed: 700.23 samples/sec Train-accuracy=0.958281
INFO:root:Epoch[146] Resetting Data Iterator
INFO:root:Epoch[146] Time cost=69.630
one epoch costs about 70s. So 200 epochs cost roughly 4 hours.
@tornadomeet @mli
This time I train resnet164, with longer epochs (300), and drop lr at 220, 260, 280.
the best error rate is 4.68%, higher than resnet1001 reported on Kaiming's paper (4.96%). Interesting!
@tornadomeet @mli
This time I train resnet164, with longer epochs (300), and drop lr at 220, 260, 280.
the best error rate is 4.68%, higher than resnet1001 reported on Kaiming's paper (4.96%). Interesting!
i don't konw you have notice that for the cifar10 the paper split 5k to val test so finaly paper used 45k data to train.