tornadomeet/ResNet

resnet 164 on cifar10

Closed this issue · 5 comments

I have test the this repository on cifar10 with resnet164.
curves

I have decreased lr at 160, 180, 200 epoch by 10. The begining lr is 0.1.

The best top1 accuracy is 94.77%, which is higher on than those in Kaiming's paper (error rate 5.46%).

I also found an interesting thing, if you train resnet164 with longer epochs at the first lr, the accuracy will be higher than resnet-1001 in Kaiming's paper. However, i haven't save the log. 😢

this maybe useful when training imagenet i hear about someone using mx.lr_scheduler.MultiFactorScheduler(step=[60*epoch_size, 70*epoch_size, 80*epoch_size], factor=0.1), and this can lead a good result when running longer before first decrease lr.

mli commented

@austingg how fast is resnet on cifar10? i can borrow your several ec2 g2.8x machine if you want to do more experiments. a g2.8x roughly equals to one k40.

@mli I use two gtx 1080 gpus. With batchsize 128, the speed is about 700 samples/sec.
larger batchsize, higher speed, but larger batchsize may affect the final performance.

INFO:root:Epoch[146] Batch [50] Speed: 740.35 samples/sec   Train-accuracy=0.959375
INFO:root:Epoch[146] Batch [100]    Speed: 719.88 samples/sec   Train-accuracy=0.956250
INFO:root:Epoch[146] Batch [150]    Speed: 719.03 samples/sec   Train-accuracy=0.955937
INFO:root:Epoch[146] Batch [200]    Speed: 720.36 samples/sec   Train-accuracy=0.952812
INFO:root:Epoch[146] Batch [250]    Speed: 720.53 samples/sec   Train-accuracy=0.953281
INFO:root:Epoch[146] Batch [300]    Speed: 716.25 samples/sec   Train-accuracy=0.957344
INFO:root:Epoch[146] Batch [350]    Speed: 700.23 samples/sec   Train-accuracy=0.958281
INFO:root:Epoch[146] Resetting Data Iterator
INFO:root:Epoch[146] Time cost=69.630 

one epoch costs about 70s. So 200 epochs cost roughly 4 hours.

@tornadomeet @mli
This time I train resnet164, with longer epochs (300), and drop lr at 220, 260, 280.
the best error rate is 4.68%, higher than resnet1001 reported on Kaiming's paper (4.96%). Interesting!
learning-curve

@tornadomeet @mli
This time I train resnet164, with longer epochs (300), and drop lr at 220, 260, 280.
the best error rate is 4.68%, higher than resnet1001 reported on Kaiming's paper (4.96%). Interesting!
learning-curve

i don't konw you have notice that for the cifar10 the paper split 5k to val test so finaly paper used 45k data to train.