UCLA-StarAI/SIMPLE

Question about backward propagation

Closed this issue · 1 comments

Dear contributors,

For the marginals function in the file DVAE/DVAE-SIMPLE.ipynb, why do you calculate a grad of log_pr over log_p? I think grad of log_pr over theta makes more sense to me.

Hi Chendi,

I believe the calculation is correct as it stands. It is a well known fact that the derivatives of the log partition function w.r.t the log unnormalized probabilities yields the log conditional probabilities. To see that, consider that taking the gradient of the log partition function first gives you the partition function p(alpha) in the denominator. Now you need to differentiate what's within the log, which yields the exp log-probs of the models depending on the variable w.r.t which we're differentiating; this yields p(Xi ^ alpha) in the numerator. p(Xi ^ alpha)/p(alpha) is exactly the conditional marginals.

For a more formal proof, you can check Theorem 1 in the paper.