SVI updates

Question

SVI updates

Closed this issue 11 years ago · 7 comments

There may be a small bug in the SVI update for Gaussians with a NIW prior. I've made a line comment here at line 311.

Basically, the hyper parameters should also be multiplied by 1./minibatchfactor as they are part of the intermediate parameters. It doesn't look like there's been any simplification of the update from the SVI paper.

Answer 1 · 2014-03-27T18:30:29.000Z

Hmmm maybe, but let's think about this. If you take a look at this pdf, in Eq. 13 I use the symbol s for the thing called 1./minibatchfrac in the code, and as you can see it only scales the expected statistics and not the global hyperparameter. That comes from a combination of Eq. 11 (which writes the natural gradient as an expectation over the latent business for the full dataset) and Eq. 2 (which writes a gradient as a scaled expectation). I think because the scaling is meant to rescale a minibatch expectation into a full-dataset expectation, it should only touch the expectation and not the hyperparameter.

Thoughts?

Answer 2 · 2014-03-27T18:33:15.000Z

Going through the pdf now, will get back to you soon.

On Thu, Mar 27, 2014 at 11:30 AM, Matthew Johnson
notifications@github.comwrote:

Hmmm maybe, but let's think about this. If you take a look at this pdfhttp://www.mit.edu/%7Emattjj/content/svi.pdf,
in Eq. 13 I use the symbol s for the thing called minibatchfrac in the
code, and as you can see it only scales the expected statistics and not the
global hyperparameter. That comes from a combination of Eq. 11 (which
writes the natural gradient as an expectation over the latent business for
the full dataset) and Eq. 2 (which writes a gradient as a scaled
expectation). I think because the scaling is meant to rescale a minibatch
expectation into a full-dataset expectation, it should only touch the
expectation and not the hyperparameter.

Thoughts?

Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-38842775
.

Answer 3 · 2014-03-27T18:39:47.000Z

Maybe it's best to look at Eq. 12 and just write that sum as K times an expectation over a uniformly selected term. That expression is definitely the natural gradient, and so sampling that expectation is an unbiased estimate of the natural gradient. (But I could be wrong...)

Answer 4 · 2014-03-27T19:10:42.000Z

I take it back, the code you have looks correct. In my implementation I used _posterior_hyperparms of the weighted statistics and converted the result to natural parameters which caused my confusion (which could have just been incorrect).

Answer 5 · 2014-03-27T19:16:55.000Z

Okay cool, definitely worth a check though. I think everything is easier in natural coordinates; I want to rewrite this code to use natural parameters everywhere. (I've been burned by other parameters and/or conversions between them too many times...)

Answer 6 · 2014-03-27T19:19:57.000Z

I was actually going to email you about that because I want to use
pybasicbayes for another project and natural coordinates are just easier.

On Thu, Mar 27, 2014 at 12:16 PM, Matthew Johnson
notifications@github.comwrote:

Okay cool, definitely worth a check though. I think everything is easier
in natural coordinates; I want to rewrite this code to use natural
parameters everywhere.

Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-38848428
.

Answer 7 · 2014-03-27T19:29:53.000Z

Sent you an email!