Part 1: weights diverge when using more input samples

Question

Part 1: weights diverge when using more input samples

Closed this issue 10 years ago · 1 comments

First, thank you for these articles!
However, when playing with the code, if I changed the number of input samples to 40 I get this result:

w(0): 0.1000     cost: 46.1816
w(1): 4.7754     cost: 92.1105
w(2): -1.8647    cost: 184.7509
w(3): 7.5657     cost: 371.6103
w(4): -5.8276    cost: 748.5129

I solved this by using a learning rate inversely proportional to the number of samples, i.e.
learning_rate = 2 / nb_of samples
instead of a fixed 0.1.

I tested it with sample sizes from 5 to 10 million, and it seems to always converge now.
I don't know if this makes any mathematical sense, just want to let you know.

Answer 1 · 2015-07-11T20:50:05.000Z

Hi Brent,

This definitely makes sense, since with the sum of squared errors the size
of the error signal is dependent on the number of samples. I am working on
a part 5 that explains stochastic gradient descent (minibatches), where you
would use a sample size independent error measure by averaging the error
signals instead of summing them.

Best regards,
Peter

On Sun, Jul 5, 2015 at 9:12 AM Brent De Weerdt notifications@github.com
wrote:

First, thank you for these articles!
However, when playing with the code, if I changed the number of input
samples to 40 I get this result:

w(0): 0.1000 cost: 46.1816
w(1): 4.7754 cost: 92.1105
w(2): -1.8647 cost: 184.7509
w(3): 7.5657 cost: 371.6103
w(4): -5.8276 cost: 748.5129

I solved this by using a learning rate inversely proportional to the
number of samples, i.e.
learning_rate = 2 / nb_of samples
instead of a fixed 0.1.

I tested it with sample sizes from 5 to 10 million, and it seems to always
converge now.
I don't know if this makes any mathematical sense, just want to let you
know.

—
Reply to this email directly or view it on GitHub
#5.