JUGGHM/PENet_ICRA2021

The number of epochs to train the model

chaytonmin opened this issue · 5 comments

The number of epochs to train the model (the three stages) is 100 ?
And how long to train the whole model ?

Thanks for your interest! It takes about 2~3 days to get ENet convergence at around epoch 20 in stage 1 and few hours in stage 2. But training in stage 3 is time consuming and it takes around a week. So if you are equipped with more devices we suggest that you could try higher resolution inputs in stage 3 and adjust the learning rate decay nodes to earlier ones.

Thanks for your interest! It takes about 2~3 days to get ENet convergence at around epoch 20 in stage 1 and few hours in stage 2. But training in stage 3 is time consuming and it takes around a week. So if you are equipped with more devices we suggest that you could try higher resolution inputs in stage 3 and adjust the learning rate decay nodes to earlier ones.

Thanks for your quick reply! It's too long for me to try the model. Anyway, I may try it later.

@JUGGHM Thanks for your excellent work. I am trying to re-implement your training process, but there's a few questions confused me:

  1. Whether the larger batch size leads to better performance? E.g. 16 or 32
  2. Whether the higher resolution in stage 3 is better ?
    I use batch size 12 for stage 2 training, but the result seem not improved comparing to stage 1

Looking forward to your reply

@JUGGHM Thanks for your excellent work. I am trying to re-implement your training process, but there's a few questions confused me:

  1. Whether the larger batch size leads to better performance? E.g. 16 or 32
  2. Whether the higher resolution in stage 3 is better ?
    I use batch size 12 for stage 2 training, but the result seem not improved comparing to stage 1

Looking forward to your reply

Thanks for your interest!

(1) Generally larger batch sizes will lead to at least not worse performance. Yet once I tried a batch-size of 12 and failed without adjusting the learning rate decay nodes. But I had successful experience when conducting experiments on B4 Small (half channels). And the performances were similar with a batch size of 10 and 20. Meanwhile the learning rate decay nodes adjusted. So the conclusion is that the hyper parameters needs to be further adjusted with larger batch sizes.

(2) I do think so but I had only 2x11G GPUs when doing this project.

(3) That's it. We don't expect results after stage2 better than stage1. The training procedure in stage2 could be regarded as an initialization step of stage3.

@JUGGHM Thanks for your excellent work. I am trying to re-implement your training process, but there's a few questions confused me:

  1. Whether the larger batch size leads to better performance? E.g. 16 or 32
  2. Whether the higher resolution in stage 3 is better ?
    I use batch size 12 for stage 2 training, but the result seem not improved comparing to stage 1

Looking forward to your reply

Thanks for your interest!

(1) Generally larger batch sizes will lead to at least not worse performance. Yet once I tried a batch-size of 12 and failed without adjusting the learning rate decay nodes. But I had successful experience when conducting experiments on B4 Small (half channels). And the performances were similar with a batch size of 10 and 20. Meanwhile the learning rate decay nodes adjusted. So the conclusion is that the hyper parameters needs to be further adjusted with larger batch sizes.

(2) I do think so but I had only 2x11G GPUs when doing this project.

(3) That's it. We don't expect results after stage2 better than stage1. The training procedure in stage2 could be regarded as an initialization step of stage3.

Got it, thanks for your reply, I will try different batch size and adjust the hyper-parameters according to that.