Question about difference of formula of composition_score and copy_score between paper and code
Closed this issue · 2 comments
As defined in Equation 5 of the paper, the final probability distribution P_t is a mix of generation distribution and copy distribution.
p_t(w) = (1−α_copy)∗p_gen(w)+(α_copy)∗p_copy(w)
However, I found the formula used in code as follows:
composite_scores = copy_alpha * composite_scores
copy_scores = (1 - copy_alpha) * copy_attn
I am concered about whether the difference would influence the effect of model and how it would effect on the performance.
Good catch, but I suppose there is no difference between them. You can do some experiments if you are interested in it.
Good catch, but I suppose there is no difference between them. You can do some experiments if you are interested in it.
Thanks for your answer. I experiment on both origin code and modified code based on provided pretrain model. And it seems to have a little better performance after modification. However, I only tried 2 times, the result may be a stochastic improvement.
I agree to your comment. The alpha in formula of code may be considered as generate_alpha for generate mode.