Part 1: weights diverge when using more input samples
Closed this issue · 1 comments
First, thank you for these articles!
However, when playing with the code, if I changed the number of input samples to 40 I get this result:
w(0): 0.1000 cost: 46.1816
w(1): 4.7754 cost: 92.1105
w(2): -1.8647 cost: 184.7509
w(3): 7.5657 cost: 371.6103
w(4): -5.8276 cost: 748.5129
I solved this by using a learning rate inversely proportional to the number of samples, i.e.
learning_rate = 2 / nb_of samples
instead of a fixed 0.1.
I tested it with sample sizes from 5 to 10 million, and it seems to always converge now.
I don't know if this makes any mathematical sense, just want to let you know.
Hi Brent,
This definitely makes sense, since with the sum of squared errors the size
of the error signal is dependent on the number of samples. I am working on
a part 5 that explains stochastic gradient descent (minibatches), where you
would use a sample size independent error measure by averaging the error
signals instead of summing them.
Best regards,
Peter
On Sun, Jul 5, 2015 at 9:12 AM Brent De Weerdt notifications@github.com
wrote:
First, thank you for these articles!
However, when playing with the code, if I changed the number of input
samples to 40 I get this result:w(0): 0.1000 cost: 46.1816
w(1): 4.7754 cost: 92.1105
w(2): -1.8647 cost: 184.7509
w(3): 7.5657 cost: 371.6103
w(4): -5.8276 cost: 748.5129I solved this by using a learning rate inversely proportional to the
number of samples, i.e.
learning_rate = 2 / nb_of samples
instead of a fixed 0.1.I tested it with sample sizes from 5 to 10 million, and it seems to always
converge now.
I don't know if this makes any mathematical sense, just want to let you
know.—
Reply to this email directly or view it on GitHub
#5.