sjoerdvansteenkiste/Neural-EM

Question on equation (4) in the paper

Closed this issue · 5 comments

Hi, thanks for your great work and I have a question on equation (4)
eq-4
Equation(4) is calculating gradients of Q w.r.t 1, which is equation (2)
eq-2
since 1 is related with 1 in through 1.
I know the first part is 1
and my question is how to calculate the gradient of the second part 1 ?

Did you use 1 to get the result? but still no equation for 1

Sorry to bother and I know this is really a basic question but I tried a long time but did not get the result
Thanks for your time and any help would be appreciated

Hi, Yes indeed, we use log p(x, z | \psi) = log p(x | z, \psi) + log p(z). The former is the log-likelihood, which, when using a Bernoulli or Gaussian distribution for the pixels, can be differentiated w.r.t \theta by backpropagating through f_\phi (that is the third term in the first equation you mentioned, and the second term in that same equation is obtained from taking the derivative of a Gaussian or Bernoulli w.r.t. its mean parameter -- note the proportional sign). The latter term log p(z) is simply the prior belief about what fraction of all pixels is present in each group (i.e. the size of the cluster). In this work, we assumed an informative uniform prior (all clusters are of the same size), so that this term becomes a constant that can be ignored when computing the derivative w.r.t. \theta -- only the likelihood term needs to be considered.

In case you are interested, there was follow-up work at AAAI by Yuan et al. where the prior is also learned: https://ojs.aaai.org/index.php/AAAI/article/view/4947

Hope this helps and don't hesitate to reach out should anything else be unclear!

Thanks a lot !! Thanks for you quick and detailed reply. I understand now.
And I will read the following up paper once I get through this one. 👍
Best wishes!

B28LH commented

Hi Sjoerd (@sjoerdvansteenkiste),

I was also working through the derivation for equation (4), and I keep getting off by a negative, getting instead
Screen Shot 2023-05-15 at 5 01 06 pm

I used the product rule mentioned above, but when taking the derivate of $\ln p(x|z, \psi)$, I get two negatives cancelling out. I applied the decomposition you mentioned above, with $\ln p(x, z | \psi) = \ln p(x | z, \psi) + \ln p(z)$, and ignored the prior. Setting $\mu = \psi_{i,k}$ as you do in the paper, and leaving $\sigma$ as a constant, we get

$$\begin{align*} \ln p(x_i | z_{i,k}=1, \psi_{i,k}) &= \ln \mathcal{N}(x_i | \mu, \sigma^2) \\ &= \ln \left(\frac{1}{\sqrt{2\pi \sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)\right) \\ &= - \frac{(x_i - \mu)^2}{2\sigma^2} -\ln \sqrt{2\pi \sigma^2} \end{align*}$$

Thus taking the derivative w.r.t. $\mu$ (aka $\psi_{i,k}$), we get the negative from inside the bracket cancelling the negative out the front:

$$\begin{align*} \frac{\partial}{\partial \mu} \ln \mathcal{N}(x_i | \mu, \sigma^2) &= \frac{x_i - \mu}{\sigma^2} \end{align*}$$

which seems to imply that equation (4) should have $(x_i - \psi_{i,k})$, not the other way round. Have you incorporated the negative into the proportionality (this seems bizarre as you are doing gradient ascent, not descent)? Otherwise, if could you kindly point out where my error is (I'm happy to show more working for how I got here), I would be very grateful.

Kind regards,
Ben

Hi Ben,

Sorry for the delay, I am only sporadically interacting with Github these days. At a first glance it looks like you are right and this is indeed a typo in the paper. Inspecting the code (nem_model.py line 100) also suggests that we use x - \psi_k.

I vaguely remember that we went back and forth a couple of times on the sign in some of these equations, but it has been over 5 years now and is a little hard to remember :) In principle it shouldn't matter whether we do gradient ascent on the log likelihood (according to the derivation) or descent on the negative log likelihood, though perhaps we accidentally ended up mixing these formulations in the paper. If this is important to you, then I'll see if I can find some time to go through the derivations once more and provide a definitive answer.

Best,
Sjoerd

B28LH commented

Hi Sjoerd,

Thanks for the response! No worries about the delay. We were assigned to make a video review of your paper, so I was just checking the derivations. That's been handed in now, so you don't have to go through the derivations again!

Thanks again,
Ben