gaozhihan/PreDiff

Question about the design of shifted predicted mean

OswaldoBornemann opened this issue · 1 comments

I have noticed that after training the knowledge control network, in the paper, you mentioned that we need to shift the predicted mean by $-\lambda_{\mathcal{F}}\sum_{\theta}\nabla_{z_t}||U_{\phi}(z_t, t, y) - \mathcal{F}_0(y)||$. However, I think that the term $||U_{\phi}(z_t, t, y) - \mathcal{F}_0(y)||$ should be closed to zero because this is the exact training objective of the knowledge control network. Therefore, the gradient would be very close to zero.

May I ask why you would design the form like this?

Thank you for your question. While the term $||U_{\phi}(z_t, t, y) - \mathcal{F}_0(y)||$ is close to zero when $z_t$ is expected to obey the constraint, this does not necessarily imply that the gradient of this term with respect to $z_t$ should also be close to zero. The norm and the gradient of the norm are different.