HPI-DeepLearning/crnn-lid

Early Stopping occurs during training

Arafat4341 opened this issue · 11 comments

I am trying to train model. But While training, all on a sudden Early Stopping occurs. The model is supposed to be trained upto 50 epochs. But at 15-16 epochs it stops.

Can anyone tell why this early stopping occurs?

What do you mean with it stops? Is the accuracy not increasing anymore? Is the script crashing? ...?

@Bartzi Yeah The validation accuracy is not increasing for last 9 epochs.

Well, in order to help. I would need more information. What are you training on? Your own data? What is the current validation accuracy? Did you change anything in the code? What is your train configuration?

Yes I am training on my own Data. Dataset contains English and Japanese audios.
Current validation accuracy 0.84.

Training configuration:

batch_size: 64
learning_rate: 0.001
num_epochs: 50

data_loader: "ImageLoader"
color_mode: "L"  # L = bw or RGB
input_shape: [129, 500, 1]

model: "topcoder_crnn_finetune" # _finetune"

segment_length: 10  # number of seconds each spectogram represents
pixel_per_second: 50

label_names: ["EN", "JP"]
num_classes: 2

It could be that your model converges, or that your learning rate is too high at this point. You could add a callback that scales the learning rate of the Adam optimizer, maybe that helps.

@Bartzi Thanks a lot for your suggestion! Would you kindly tell me how can I modify callback object in training script to scale the learning rate?!

The best tip is to have a look at the documentation of Keras 😉
This page, for instance, could be helpful: https://faroit.com/keras-docs/1.2.2/callbacks/

@Bartzi Thanks a lot!

I found : ReduceLROnPlateau.
We can add this to update the learning rate if the accuracy doesn't improve after certain epochs.
Currently the learning rate we are using is 0.001.
Can you suggest me the lower bound for lr ?
Actually I cannot guess how less will be too less!

As with most things in Deep Learning: You have to try. But my first idea would be to set it to 0.0001 and see what happens. Going any lower than 1e-6 as starting learning rate is not a good idea, so that would be too low.

@Bartzi Thanks a lot.
The model still converges. Stops at epoch 23. No improvement of validation accuracy for last 20 epochs.

early_stopping_callback = EarlyStopping(monitor='val_loss', min_delta=0, patience=15, verbose=1, mode="min")
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.0001)

I collected the data from youtube. The problem is My dataset is not so big. Only 3328 mel-spec images for training set.
Problem occurred while downloading data. Youtube_dl sent too many requests error. So I couldn't download as much data as I wanted.
Can it be a problem with my data? Am I trying to train on really poor data?!

Yes, that low amount of training data might very well be a problem.
Maybe you can try to gather more data, I think this should help with your results.