Has sb. trained the transformer model on WMT14 en-de and test on newstest2014?
minorfox opened this issue · 8 comments
Now the model runs 55000 steps, the score is also 0.12xxx.
Is it correct?
batch_size=6250 update_cycle=4 (All defalut)
It seems not correct. The BLEU score should be more than 20 there. Please check your data and settings.
BTW, we recommend to use this parameter set for WMT14 en-de task:
shared_embedding_and_softmax_weights=true,layer_preprocess=layer_norm,layer_postprocess=none,attention_dropout=0.1,relu_dropout=0.1,adam_beta2=0.98
This setting yield a 27.03 BLEU score after 200000 training steps (average the last 5 checkpoints)
@Glaceon31 Last night, I changed the corpus according to the Manual and all sets are the default. (wmt17de-en/train, newstest14/test)
Here is the log:
2019-08-20 02:50:11.065016: BLEU at step 5000: 0.158979 2019-08-20 05:24:28.222200: BLEU at step 10000: 0.189248 2019-08-20 07:58:33.829838: BLEU at step 15000: 0.195997 2019-08-20 10:32:39.163866: BLEU at step 20000: 0.197948 2019-08-20 13:06:27.287002: BLEU at step 25000: 0.202929 2019-08-20 15:37:04.897173: BLEU at step 30000: 0.207372 2019-08-20 18:07:53.231772: BLEU at step 35000: 0.209581 2019-08-20 20:38:38.589960: BLEU at step 40000: 0.207665 2019-08-20 23:09:15.135549: BLEU at step 45000: 0.208846
It seems, emmm....., also incorrect.
Maybe also the problem of python3.x(replace "@@ " or others... ), I think I should print the translator results.
I'll do it tomorrow.
Thank you very much!!
@Glaceon31
In the hook.py, line 172-173, you replaced the "@@" of the model output, why don't you do that for reference?
I added the replace op to reference, then I got a score "0.32" corresponding the score "0.21 by unreplaced reference. This score(0.32) also equals the score computed by translator.py
outputs after sed 's blablabla
op.
So, is it possible that you forgot to add the replace op to reference?
@minorfox We do not add replace operation to reference on the premise that the reference does not do the BPE operation. We don't think it makes sense to do BPE in reference. Thanks very much!
@Glaceon31 But the code used the decoded_refs
to compute bleu score. The decoded_refs
comes from the feats["references"]
which is the BPEd corpus.
I also print the decoded_refs
:
[['the', 'formerly', 'super', 'secre@@', 'tive', 'N@@', 'SA', ',', 'once', 'nick@@', 'named', 'No', 'Su@@', 'ch', 'Agency', ',', 'has', 'found', 'itself', 'in', 'very', 'public', 'light', ',', 'and', 'am@@', 'id', 'vi@@', 'cious', 'criticism', ',', 'in', 'past', 'months', 'following', 'a', 'stream', 'of', 're@@', 'vel@@', 'ations', 'about', 'is', 'vast', 'foreign', 'and', 'domestic', 'surveillance', 'programs', '-', 'collectively', 'the', 'product', 'of', 'secret', 'N@@', 'SA', 'files', 'stol@@', 'en', 'from', 'the', 'agency', 'and', 'le@@', 'aked', 'by', 'di@@', 'sen@@', 'chan@@', 'ted', 'former', 'N@@', 'SA', 'contrac@@', 'tor', 'Ed@@', 'ward', 'Snow@@', 'den', '.']]
It is BPEd.
@minorfox You do not need to BPE the reference file when validation.
@Glaceon31 I know you mean. When I used the translator.py
, I used the unBPE reference to get the bleu score. But in this code, hook.py
, it used the BPEd reference. That is why I got an incorrect log/score before.
You should use unBPEd reference file for validation because our code will automatically unBPE the hypothesis. You can refer to the newest version of user manual.