Alibaba-NLP/StructuralKD

Some questions about formula 3

pikapi111 opened this issue · 1 comments

Why when logPs(y|x) is expanded, logZ(x) can be extracted alone without multiplying Pt(y|x)?
In the end of formula 3 why two summation symbols can be changed into one?

It is $log Z_s(x)$, so the value does not depends on the teacher model. The $P_t(y|x)$ will be summed out to 1 when $P_t(y|x)$ times $log Z_s(x)$. In the last line, $\sum_{y \in Y}$ will be cancelled out with $1_{(u \in y)}$