The accuracy of training set is lower than testing set

Question

The accuracy of training set is lower than testing set

Closed this issue 3 years ago · 3 comments

When I reproduced the original paper with your code, after 50000 steps, the correct rate of the model in yanzhengji1 and test set was greater than 99%, while in training set it was 91%, which was the result after repeated four times. Do you know the reason?

Answer 1 · 2021-09-19T16:20:57.000Z

Hello @mars234. Did you get 99% in the validation set, or in the test set? Or both?

I've noticed something similar in my own training curves, though I've trained only up to 23000 steps as said in the paper. At 23K steps, training accuracy is 93.34% and validation accuracy is 96.15%, which is indeed unusual. Accuracy of the test set, however, is 95.98% -- which is similar to the validation, as it should be.

In fact, my training accuracy is always a bit worse than validation accuracy from the beginning. Here's a log from wandb, where you can see that validation accuracy is higher than training from the start.

The only possibilities I can currently think of are:

Maybe the Spectral Augmentation, which is applied only during training, somehow makes it a bit harder for the model to make right predictions -- hence training accuracy lags behind validation
There may be some form of data leakage in the actual Google Speech Commands dataset, though I see no easy way to verify this
There may be a bug in my code that I'm missing

Answer 2 · 2021-09-23T12:18:09.000Z

 Thank you very much for your reply. Even if I use the smallest KWT-1, the accuracy in training set and validation set is 99.5%. Because of the limitation of the GPU memory, I set batch to 256 and epoch to 140, so my steps are equal to 58000, which are twice as large as the original paper.
After my research, I think the first possibility you mentioned is relatively large. I try to use the model to predict the original training set without data augmentation, and the results show that the model can still achieve 99% accuracy. This shows that the model can learn the features of the original data well. On the other hand, I check all your code and find no bugs.

Thank you very much for your opening code. I have benefited a lot from it.

Answer 3 · 2021-09-23T17:13:39.000Z

Indeed, it is the first case. I sent an email to Axel Berg, the paper's author, and according to him, training accuracy being lower than validation is normal and expected. It is caused by the aggressive augmentation and weight decay used to prevent the model from overfitting. Transformers easily overfit and typically require strong regularization.

Axel was also kind enough to send me the official training/validation curve for KWT-1 (orange: train, blue: validation), and you can basically observe the same thing here: validation accuracy remains higher than training all the time.