Overfitting issue
Closed this issue · 1 comments
Hello it's me again !
I was using ARU-Net to find baselines in my old documents with quite a lot of success, training over 100 epochs seemed to yield good enough results, with coherent training and validation loss values (validation slightly over training in the end). However, I tried running it for 250 epochs, and I noticed that the validation loss keeps getting higher after around 100 epochs, as can be seen here :
I don't have a lot of data (about 200 labeled images), so I thought maybe the distribution of validation and training data (80% of training, 20% of validation, randomly distributed) was unfortunate on this example. But here is what I get running the experiment 13 times (I'm showing the average of loss at each epoch), each time 250 epochs with a new random distribution each time :
Do you have an idea of what could cause that ?
Thanks !
Hi again,
sorry for the late answer.
First of all, nice to hear that you used the ARU-Net successfully.
The overfitting is typically due to a quite small amout of training samples.
The network has enough representative power to learn the training set.
Hence, you will see (as you did) this for all distributions of training and validation
data. To avoid overfitting you could utilize more massive data augmentation strategies
(affine, scale, perspective augmentations should already be enabled...).
However, you should not entirely rely on the validation loss.
This is because the loss corresponds to the pixel labeling task.
But you don't mind whether a baseline is detected with 3 or 4 pixels thickness,
but this difference results in quite big differences in the loss...
Therefore, I typically extract the baselines and evaluate the quality of the model based on:
https://arxiv.org/abs/1705.03311
available via
https://github.com/Transkribus/TranskribusBaseLineEvaluationScheme
Kind regards,
Tobias