Question on equation (4) in the paper
Closed this issue · 5 comments
Hi, thanks for your great work and I have a question on equation (4)
Equation(4) is calculating gradients of Q w.r.t , which is equation (2)
since is related with
in through
.
I know the first part is
and my question is how to calculate the gradient of the second part ?
Did you use to get the result? but still no equation for
Sorry to bother and I know this is really a basic question but I tried a long time but did not get the result
Thanks for your time and any help would be appreciated
Hi, Yes indeed, we use log p(x, z | \psi) = log p(x | z, \psi) + log p(z). The former is the log-likelihood, which, when using a Bernoulli or Gaussian distribution for the pixels, can be differentiated w.r.t \theta by backpropagating through f_\phi (that is the third term in the first equation you mentioned, and the second term in that same equation is obtained from taking the derivative of a Gaussian or Bernoulli w.r.t. its mean parameter -- note the proportional sign). The latter term log p(z) is simply the prior belief about what fraction of all pixels is present in each group (i.e. the size of the cluster). In this work, we assumed an informative uniform prior (all clusters are of the same size), so that this term becomes a constant that can be ignored when computing the derivative w.r.t. \theta -- only the likelihood term needs to be considered.
In case you are interested, there was follow-up work at AAAI by Yuan et al. where the prior is also learned: https://ojs.aaai.org/index.php/AAAI/article/view/4947
Hope this helps and don't hesitate to reach out should anything else be unclear!
Thanks a lot !! Thanks for you quick and detailed reply. I understand now.
And I will read the following up paper once I get through this one. 👍
Best wishes!
Hi Sjoerd (@sjoerdvansteenkiste),
I was also working through the derivation for equation (4), and I keep getting off by a negative, getting instead
I used the product rule mentioned above, but when taking the derivate of
Thus taking the derivative w.r.t.
which seems to imply that equation (4) should have
Kind regards,
Ben
Hi Ben,
Sorry for the delay, I am only sporadically interacting with Github these days. At a first glance it looks like you are right and this is indeed a typo in the paper. Inspecting the code (nem_model.py line 100) also suggests that we use x - \psi_k.
I vaguely remember that we went back and forth a couple of times on the sign in some of these equations, but it has been over 5 years now and is a little hard to remember :) In principle it shouldn't matter whether we do gradient ascent on the log likelihood (according to the derivation) or descent on the negative log likelihood, though perhaps we accidentally ended up mixing these formulations in the paper. If this is important to you, then I'll see if I can find some time to go through the derivations once more and provide a definitive answer.
Best,
Sjoerd
Hi Sjoerd,
Thanks for the response! No worries about the delay. We were assigned to make a video review of your paper, so I was just checking the derivations. That's been handed in now, so you don't have to go through the derivations again!
Thanks again,
Ben