Question about backward propagation
Closed this issue · 1 comments
Dear contributors,
For the marginals
function in the file DVAE/DVAE-SIMPLE.ipynb
, why do you calculate a grad of log_pr
over log_p
? I think grad of log_pr
over theta
makes more sense to me.
Hi Chendi,
I believe the calculation is correct as it stands. It is a well known fact that the derivatives of the log partition function w.r.t the log unnormalized probabilities yields the log conditional probabilities. To see that, consider that taking the gradient of the log partition function first gives you the partition function p(alpha) in the denominator. Now you need to differentiate what's within the log, which yields the exp log-probs of the models depending on the variable w.r.t which we're differentiating; this yields p(Xi ^ alpha) in the numerator. p(Xi ^ alpha)/p(alpha) is exactly the conditional marginals.
For a more formal proof, you can check Theorem 1 in the paper.