Learning rate

Question

Learning rate

Opened this issue 6 years ago · 9 comments

I don't know how to get the parameters of the Ours-ResNet segmentation network. Can you give a explain for the parameters ?Thanks.

Answer 1 · 2018-10-22T00:16:20.000Z

I tried change the learning rate to 0.01,and the batchsize 4,the loss is decreased to 0.0403,only within one epoch(Iter:37000/39675,a epoch almost finised but failed),but the program often cause a error like this:
validating ... terminate called after throwing an instance of 'std::system_error'
what(): Operation not permitted

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

So,is there any parameters need to change?or any advice on the error?

Answer 2 · 2018-10-22T00:36:15.000Z

I tried change the learing rate to 0.01,and batchsize=4,and due to the limit of GPU resource,I set
model = torch.nn.DataParallel(model,device_ids=[0]),but after there is alwayes a error like this:
Iter:36900/39675 Loss:0.0413 imps:3.5 Fin:Mon Oct 22 03:56:00 2018 lr: 0.0009
Iter:36950/39675 Loss:0.0363 imps:3.5 Fin:Mon Oct 22 03:55:56 2018 lr: 0.0009
Iter:37000/39675 Loss:0.0403 imps:3.5 Fin:Mon Oct 22 03:55:53 2018 lr: 0.0009

validating ... terminate called after throwing an instance of 'std::system_error'
what(): Operation not permitted

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

the program can't finish a epoch,but the loss is decreased to 0.0403,is it accepted or need more epoch?
Do you have any advise on this error?

Answer 3 · 2018-10-23T09:06:27.000Z

@hardBird123, I trained by Adam setting initial learning rate as 0.001. But I didn't try to find the optimal learning rate. You can get better results than mine by adopting SGD or just following the method described in https://arxiv.org/pdf/1611.10080.pdf.

Answer 4 · 2018-10-23T09:12:08.000Z

@LeiyuanMa, Sorry, I can't help you with that error. Probably related to the memory leak. In my case, training epochs do not change the performance a lot. And I haven't tested training 15 epochs is the best for the network.

Answer 5 · 2018-10-23T09:13:33.000Z

ok, thanks. I want to confirm that the weights [ilsvrc-cls_rna-a1_cls1000_ep-0001.params] is the pretrained weights for training segmentation network ResNet38?

Answer 6 · 2018-10-23T09:15:20.000Z

@hardBird123, Yes, that is the right file for the segmentation network.

Answer 7 · 2018-10-23T09:20:15.000Z

thanks,so is the loss=0.0403 acceptable?

Answer 8 · 2018-10-30T13:02:26.000Z

which lr_type(fixed(default)/step/linear) should I choose when training the ResNet38 segmentation network？

Answer 9 · 2019-02-28T08:56:16.000Z

hello, I'm a student who running this code. And there is a running error. Can you give me some tips about this issue.

RuntimeError: size mismatch, m1: [1 x 20], m2: [1 x 20]