mattjj/pybasicbayes

SVI updates

Closed this issue · 7 comments

There may be a small bug in the SVI update for Gaussians with a NIW prior. I've made a line comment here at line 311.

Basically, the hyper parameters should also be multiplied by 1./minibatchfactor as they are part of the intermediate parameters. It doesn't look like there's been any simplification of the update from the SVI paper.

Hmmm maybe, but let's think about this. If you take a look at this pdf, in Eq. 13 I use the symbol s for the thing called 1./minibatchfrac in the code, and as you can see it only scales the expected statistics and not the global hyperparameter. That comes from a combination of Eq. 11 (which writes the natural gradient as an expectation over the latent business for the full dataset) and Eq. 2 (which writes a gradient as a scaled expectation). I think because the scaling is meant to rescale a minibatch expectation into a full-dataset expectation, it should only touch the expectation and not the hyperparameter.

Thoughts?

Going through the pdf now, will get back to you soon.

On Thu, Mar 27, 2014 at 11:30 AM, Matthew Johnson
notifications@github.comwrote:

Hmmm maybe, but let's think about this. If you take a look at this pdfhttp://www.mit.edu/%7Emattjj/content/svi.pdf,
in Eq. 13 I use the symbol s for the thing called minibatchfrac in the
code, and as you can see it only scales the expected statistics and not the
global hyperparameter. That comes from a combination of Eq. 11 (which
writes the natural gradient as an expectation over the latent business for
the full dataset) and Eq. 2 (which writes a gradient as a scaled
expectation). I think because the scaling is meant to rescale a minibatch
expectation into a full-dataset expectation, it should only touch the
expectation and not the hyperparameter.

Thoughts?

Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-38842775
.

Maybe it's best to look at Eq. 12 and just write that sum as K times an expectation over a uniformly selected term. That expression is definitely the natural gradient, and so sampling that expectation is an unbiased estimate of the natural gradient. (But I could be wrong...)

I take it back, the code you have looks correct. In my implementation I used _posterior_hyperparms of the weighted statistics and converted the result to natural parameters which caused my confusion (which could have just been incorrect).

Okay cool, definitely worth a check though. I think everything is easier in natural coordinates; I want to rewrite this code to use natural parameters everywhere. (I've been burned by other parameters and/or conversions between them too many times...)

I was actually going to email you about that because I want to use
pybasicbayes for another project and natural coordinates are just easier.

On Thu, Mar 27, 2014 at 12:16 PM, Matthew Johnson
notifications@github.comwrote:

Okay cool, definitely worth a check though. I think everything is easier
in natural coordinates; I want to rewrite this code to use natural
parameters everywhere.

Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-38848428
.

Sent you an email!