CSAILVision/semantic-segmentation-pytorch

Evaluation on validation set while training

iliadsouti opened this issue ยท 5 comments

Hi,

The previous version of code (before synchronous BN) had evaluation while training (after each epoch). It seems that the new version does not have it. Is it because of Sync BN training/evaluation difference?

Thanks

Of course not. We're not doing this because:

  1. We have implemented loading from a checkpoint.
  2. The learning rate policy we use is "poly". It is related to the current number of iterations.
    Therefore, if you'd like to test the model's performance after each epoch to adjust the hyper-parameters, it is more reasonable to manually invoke test script.
  1. Loading from a checkpoint is not an efficient way for evaluation during training.
  2. It is not just about hyperparameters but monitoring the training. In many cases, the best performing model is not the last epoch's model. Manually invoking test after each epoch is not efficient at all.
  1. If you've read our codes carefully, you may find out that we have provided a multi-gpu validation script, which is very much different from the training script.
  2. If you understand what "poly" learning rate policy is, and have done several experiments using this rule, you will notice that the best model will always be the last one most of the time. Even if it isn't the last one, you'll know it makes no sense to continue from the best checkpoint, because the learning rate has been gradually decreased, and should have been decreased to zero. It is, as a matter of fact, more reasonable to reset the hyper-parameters such as base learning rate or the number of training epochs. Manually invoking validation procedure isn't in conflict with "monitoring training procedure".
  3. Personally, I never use validation accuracy on a subset of a not-very-large dataset as a reference to select model. It is highly unreliable. Instead, I use a script that automatically tests the checkpoints one by one on the full validation set. Again, if you've read our multi-GPU validation codes, you'll notice that the multi-GPU validation is realized by manually invoking several workers using Python, while the multi-GPU training is realized by DataParallel using PyTorch.

Thanks for your clear comments.
About "poly" learning rate policy, it is not always the case. It heavily relies on the dataset size (fx. Pascal VOC is smaller than ADE-20K). Overfitting can happen even with "poly" learning rate policy.
I think the best workaround would be manually invoking validation procedure after each epoch of training and then resume training.
One other question would be why the number of iterations in each epoch is regardless of batchsize?

Thanks

Sure, it is not always the case with "poly" learning rate. So it is rather frustrating :(

As for your question, I know it is weird, but actually if you are using batch norm, the batch size should be at least 16, otherwise you're very likely to get a sub-optimal result. If someone doesn't have 8 GPUs but still wants to use our framework, he or she should not train the bn layers. The consequent changes include adjusting learning rate policy, the total training iterations, and so on. Note that the training iterations w/ and w/o batch norm don't satisfy the "linear scaling rule". Such a rule indicates the batch size times training iterations is a constant. Therefore we have left the number of iterations in each epoch remain a free parameter.