Not converging due to learning rate alpha
bratao opened this issue · 4 comments
Hello,
Testing RNNSharp I was unable to make the model converge. No matter what setting I used.
I changed two places and got the model to finally converge:
if (ppl >= lastPPL && lastAlpha != rnn.LearningRate)
{
//Although we reduce alpha value, we still cannot get better result.
I changed it to break only after we tried to lower alpha 8 times and failed to get a improvement.
I also changed this fragment:
rnn.LearningRate = rnn.LearningRate / 2.0f;
To a decrease in a lower rate, in my case, 1.4.
Do any of those changes make sense ?
Do you think that a smarter learning rate annealing could improve RNNSharp, or am I looking at the wrong place ??
Thanks !!
First of all, learning ratio (alpha) is an open question for researching. There are many papers related to this problem. Some dynamitic learning ratio may have better result. And sometimes, the learning ratio strategy may be related to your data set.
Secondly, as you mentioned, the result isn't converge, but I want to make sure which corpus did you mention ? training corpus or validated corpus ? Since both CRF++ and CRFSharp don't use validated and test corpus, the better result in training corpus may cause overfitting in test corpus. So, my suggestion is that, it would be more reasonable if you create a test corpus and test CRF++/CRFSharp and RNNSharp encoded model quality on this corpus.
In addition, I have fixed the RNNSharp crashing bug when word embedding feature isn't enabled for SimpleRNN. You can get it by syncing the latest code from code base.
@zhongkaifu , I was testing the quality against a separated test corpus.
Yeah, I saw the latest commits, thank you so much !!
It is also working without word embedding in LSTM-RNN for me. ( LSTM-RNN is faster and produces better results than SimpleRNN in my case)
One commit said that it would support to train model without validated corpus. However I'm not getting good results with it. I get better results ( in a separated testing corpus) using a validated corpus equal as the training corpus.
Yes. Validated corpus is usually required for training in order to get better hyper-parameters, but sometimes, we just use training corpus for some experiments. :)