Question about model parameters, UNK words, loss function and spell error correction system
weiqi94 opened this issue · 3 comments
Hi,
I find some parameters, learning rate and weight decay, mentioned in Section 5.2 are not consistent with the released train.sh script. So, for all single models shown in Table 5, how do you set for these parameters when you did Single Model Ablation Study?
Besides, if all single models shown in Table 5 use the edit-weighted MLE? And "Ignoring UNK words as edits" means replacing the with the source word, i.e., using the "--replace-unk" parameter or just dropping the token.
I notice you use the a statistical-based spell error correction system to pre-process the training data. How can I find this system?
There is a little difference, since I rewrite all the code with the latest fairseq codebase.
Yes, edit-weighted MLE are used for all the models that I reported.
"Ignore UNK words" means that ignoring "UNK" as an edit when calculate the precision/recall scores.
I use the spell error correction system that developed by "YuanFuDao", which is not public. But you can do the spell correction with any other spell correction systems.
Thanks for your explanation.
So, does it mean we can also use the parameters in your train.sh to replicate the Single Model Ablation Study shown in Table 5?
I am also confused about the description of Figure 2 in the paper. If it is opposite? It seems the Figure 2(b) mainly focus their weights on the next word in good order.
You can modify the m2scorer scripts to ignore the "UNK" edits. The parameters in the train.sh can't replicate the ablation results.
Thanks very much. I just notice that the two pictures are reversed.