thu-coai/DA-Transformer

Can not reproduce the result when factor=4

Closed this issue · 7 comments

Hello, I tried to reproduce the situation where factor=4 and used Lookahead decoding, but get the result 25.64 BLEU which is lower than the reported one 26.14 BLEU Score on WMT'14 EN-DE raw data in the paper. I use the same environment, same training script, same decoding script and the same dataset but still fail. Can you help me? Or can you share the checkpoints on WMT14 EN-DE raw data and distilled data?

My Training Script

fairseq-train ${data_dir}  \
    \
    `# loading DA-Transformer plugins` \
    --user-dir fs_plugins \
    \
    `# DA-Transformer Task Configs` \
    --task translation_dat_task \
    --upsample-base source --upsample-scale 4 \
    --filter-max-length 128:1024 --filter-ratio 2 \
    --skip-invalid-size-inputs-valid-test \
    \
    `# DA-Transformer Architecture Configs` \
    --arch glat_decomposed_link_base \
    --links-feature feature:position \
    --max-source-positions 128 --max-target-positions 1024 \
    --encoder-learned-pos --decoder-learned-pos \
    --share-all-embeddings --activation-fn gelu --apply-bert-init \
    \
    `# DA-Transformer Decoding Configs (See more in the decoding section)` \
    --decode-strategy lookahead --decode-upsample-scale 4.0 \
    \
    `# DA-Transformer Criterion Configs` \
    --criterion nat_dag_loss \
    --length-loss-factor 0 --max-transition-length 99999 \
    --glat-p 0.5:0.1@200k --glance-strategy number-random \
    --no-force-emit \
    \
    `# Optimizer & Regularizer Configs` \
    --optimizer adam --adam-betas '(0.9,0.999)' --fp16 \
    --label-smoothing 0.0 --weight-decay 0.01 --dropout 0.1 \
    --lr-scheduler inverse_sqrt  --warmup-updates 10000   \
    --clip-norm 0.1 --lr 0.0005 --warmup-init-lr '1e-07' --stop-min-lr '1e-09' \
    \
    `# Training Configs` \
    --max-tokens 32392  --max-tokens-valid 4096 --update-freq 1 \
    --max-update 300000  --grouped-shuffling \
    --max-encoder-batch-tokens 8000 --max-decoder-batch-tokens 34000 \
    --seed 0 --ddp-backend c10d --required-batch-size-multiple 1 \
    \
    `# Validation Configs` \
    --valid-subset valid \
    --validate-interval 1       --validate-interval-updates 10000 \
    --eval-bleu --eval-bleu-detok space --eval-bleu-remove-bpe --eval-bleu-print-samples --eval-tokenized-bleu \
    --fixed-validation-seed 7 \
    \
    `# Checkpoint Configs` \
    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \
    --save-interval 1  --save-interval-updates 10000 \
    --keep-best-checkpoints 5 --save-dir ${checkpoint_dir} \
    \
    `# Logging Configs` \
    --log-format 'simple' --log-interval 100

My Decoding Script

average_checkpoint_path=${checkpoint_dir}/average.pt

python3 ./fs_plugins/scripts/average_checkpoints.py \
  --inputs ${checkpoint_dir} \
  --max-metric \
  --best-checkpoints-metric bleu \
  --num-best-checkpoints-metric 5 \
  --output ${average_checkpoint_path}

fairseq-generate ${data_dir} \
    --gen-subset test --user-dir fs_plugins --task translation_dat_task \
    --remove-bpe --max-tokens 4096 --seed 0 \
    --decode-strategy lookahead --decode-upsample-scale 4 --decode-beta 1  \
    --path ${average_checkpoint_path}

En-De checkpoints on raw data (upsample=8): https://huggingface.co/thu-coai/dat_base_translation_ende
Demo: https://huggingface.co/spaces/thu-coai/DA-Transformer
Demo Codes (How to use): https://huggingface.co/spaces/thu-coai/DA-Transformer/tree/main
I will write a instruction for the checkpoints and the new interface used in demo later.

For the reproduction problem:

  • Do you use the dataset provided in this repository?
  • I see you use a much larger max_tokens. Have you encountered a warning message that says "clip predicted length... Try a smaller validation batch size, or use a bigger max_decoder_batch_tokens" during your training or decoding?
  • Could you please provide the validation BLEU score during your training?

Thank you for your reply! Here are some responses for your questions.

Do you use the dataset provided in this repository?

Yes

I see you use a much larger max_tokens. Have you encountered a warning message that says "clip predicted length... Try a smaller validation batch size, or use a bigger max_decoder_batch_tokens" during your training or decoding?

No, I use 2*A100-80G to train the models with update_freq=1, and did not encountered the warning.

Could you please provide the validation BLEU score during your training?

The best validation BLEU score is 24.18.

Additionally, to prevent graph size too small error during validation, I use --skip-invalid-size-inputs-valid-test configuration, and encountered warning message
2023-05-23 13:43:21 | WARNING | fs_plugins.tasks.translation_dat | 4,581 samples have invalid sizes and will be skipped, max_positions=(128, 1024) and filter_ratio=2, first few sample ids=[2840886, 2840889, 2192638, 3727174, 2048466, 3558056, 3823897, 3910352, 3470167, 3402828]
Is this normal?

It is normal to filter out excessively long samples, but this has no connection to the ``--skip-invalid-size-inputs-valid-test'' flag. It is better not to enable this flag (in translation) as it would remove of some examples during validation. Additionally, encountering a "graph size too small error" warning during validation is unusual. This warning indicates that the length of the decoder input is shorter than the target length. Did you encounter this error during the validation step or training step?

By the way, I noticed that my best validation BLEU score is approximately 24.7, which suggests that there might be some problems.

When I set factor=4, the "graph size too small error" will happen during the validation step. So I have to set --skip-invalid-size-inputs-valid-test to ensure the normal training of the model.

Here is my training log, are there any log indicators that are not normal?

2023-05-23 15:00:47 | INFO | train_inner | epoch 141:     60 / 1964 ntokens=63879.7, nsentences=2148.5, nvalidtokens=228909, invalid_nsentences=0, tokens_perc=3.611, sentences_perc=1, loss=2.314, glat_acc=0.635, glat_keep=0.011, dag_nll=2.314, length_nll=5.545, dag=2.314, length=5.545, bleu=0, wps=46854.4, ups=0.73, wpb=63879.7, bsz=2148.5, num_updates=275000, lr=9.53463e-05, gnorm=1.622, clip=100, loss_scale=8192, train_wall=109, gb_free=37.5, wall=321557
2023-05-23 15:02:41 | INFO | train_inner | epoch 141:    160 / 1964 ntokens=62579.6, nsentences=2049.26, nvalidtokens=231416, invalid_nsentences=0, tokens_perc=3.738, sentences_perc=1, loss=2.326, glat_acc=0.633, glat_keep=0.01, dag_nll=2.326, length_nll=5.544, dag=2.326, length=5.544, bleu=0, wps=54852.3, ups=0.88, wpb=62579.6, bsz=2049.3, num_updates=275100, lr=9.53289e-05, gnorm=1.58, clip=100, loss_scale=8192, train_wall=112, gb_free=31, wall=321671
2023-05-23 15:04:34 | INFO | train_inner | epoch 141:    260 / 1964 ntokens=63347.5, nsentences=2133.08, nvalidtokens=231248, invalid_nsentences=0, tokens_perc=3.692, sentences_perc=1, loss=2.281, glat_acc=0.639, glat_keep=0.01, dag_nll=2.281, length_nll=5.546, dag=2.281, length=5.546, bleu=0, wps=56411.5, ups=0.89, wpb=63347.5, bsz=2133.1, num_updates=275200, lr=9.53116e-05, gnorm=1.55, clip=100, loss_scale=8192, train_wall=110, gb_free=43.3, wall=321784
2023-05-23 15:06:29 | INFO | train_inner | epoch 141:    360 / 1964 ntokens=63649.4, nsentences=1969.41, nvalidtokens=231790, invalid_nsentences=0, tokens_perc=3.672, sentences_perc=1, loss=2.307, glat_acc=0.636, glat_keep=0.011, dag_nll=2.307, length_nll=5.543, dag=2.307, length=5.543, bleu=0, wps=55046.8, ups=0.86, wpb=63649.4, bsz=1969.4, num_updates=275300, lr=9.52943e-05, gnorm=1.633, clip=100, loss_scale=8192, train_wall=113, gb_free=34.6, wall=321899
2023-05-23 15:08:27 | INFO | train_inner | epoch 141:    460 / 1964 ntokens=62795.9, nsentences=2099.39, nvalidtokens=233892, invalid_nsentences=0, tokens_perc=3.759, sentences_perc=1, loss=2.279, glat_acc=0.64, glat_keep=0.01, dag_nll=2.279, length_nll=5.543, dag=2.279, length=5.543, bleu=0, wps=53242.5, ups=0.85, wpb=62795.9, bsz=2099.4, num_updates=275400, lr=9.5277e-05, gnorm=1.607, clip=100, loss_scale=8192, train_wall=116, gb_free=44.8, wall=322017
2023-05-23 15:10:21 | INFO | train_inner | epoch 141:    560 / 1964 ntokens=63121, nsentences=2021.66, nvalidtokens=232073, invalid_nsentences=0, tokens_perc=3.714, sentences_perc=1, loss=2.275, glat_acc=0.642, glat_keep=0.01, dag_nll=2.275, length_nll=5.546, dag=2.275, length=5.546, bleu=0, wps=55546.9, ups=0.88, wpb=63121, bsz=2021.7, num_updates=275500, lr=9.52597e-05, gnorm=1.724, clip=100, loss_scale=8192, train_wall=111, gb_free=39.3, wall=322131
2023-05-23 15:12:17 | INFO | train_inner | epoch 141:    660 / 1964 ntokens=63817.7, nsentences=1905.68, nvalidtokens=231597, invalid_nsentences=0, tokens_perc=3.649, sentences_perc=1, loss=2.269, glat_acc=0.642, glat_keep=0.01, dag_nll=2.269, length_nll=5.544, dag=2.269, length=5.544, bleu=0, wps=54984, ups=0.86, wpb=63817.7, bsz=1905.7, num_updates=275600, lr=9.52424e-05, gnorm=1.719, clip=100, loss_scale=8192, train_wall=114, gb_free=34.3, wall=322247
2023-05-23 15:12:45 | INFO | fairseq.trainer | NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 4096.0
2023-05-23 15:14:17 | INFO | train_inner | epoch 141:    761 / 1964 ntokens=63146.2, nsentences=1893.79, nvalidtokens=235246, invalid_nsentences=0, tokens_perc=3.759, sentences_perc=1, loss=2.26, glat_acc=0.642, glat_keep=0.01, dag_nll=2.26, length_nll=5.544, dag=2.26, length=5.544, bleu=0, wps=52771, ups=0.84, wpb=63146.2, bsz=1893.8, num_updates=275700, lr=9.52251e-05, gnorm=1.761, clip=100, loss_scale=4096, train_wall=117, gb_free=42.8, wall=322367
2023-05-23 15:16:10 | INFO | train_inner | epoch 141:    861 / 1964 ntokens=62827.3, nsentences=2065.84, nvalidtokens=227384, invalid_nsentences=0, tokens_perc=3.657, sentences_perc=1, loss=2.403, glat_acc=0.622, glat_keep=0.011, dag_nll=2.403, length_nll=5.547, dag=2.403, length=5.547, bleu=0, wps=55600.1, ups=0.88, wpb=62827.3, bsz=2065.8, num_updates=275800, lr=9.52079e-05, gnorm=1.769, clip=100, loss_scale=4096, train_wall=111, gb_free=45.1, wall=322480
2023-05-23 15:18:05 | INFO | train_inner | epoch 141:    961 / 1964 ntokens=62120.2, nsentences=2020.57, nvalidtokens=234541, invalid_nsentences=0, tokens_perc=3.813, sentences_perc=1, loss=2.249, glat_acc=0.644, glat_keep=0.01, dag_nll=2.249, length_nll=5.547, dag=2.249, length=5.547, bleu=0, wps=53699.8, ups=0.86, wpb=62120.2, bsz=2020.6, num_updates=275900, lr=9.51906e-05, gnorm=1.679, clip=100, loss_scale=4096, train_wall=113, gb_free=41.4, wall=322595
2023-05-23 15:20:01 | INFO | train_inner | epoch 141:   1061 / 1964 ntokens=62374.8, nsentences=2211.01, nvalidtokens=235619, invalid_nsentences=0, tokens_perc=3.828, sentences_perc=1, loss=2.236, glat_acc=0.649, glat_keep=0.01, dag_nll=2.236, length_nll=5.544, dag=2.236, length=5.544, bleu=0, wps=54166.2, ups=0.87, wpb=62374.8, bsz=2211, num_updates=276000, lr=9.51734e-05, gnorm=1.778, clip=100, loss_scale=4096, train_wall=113, gb_free=39.9, wall=322710
2023-05-23 15:21:54 | INFO | train_inner | epoch 141:   1161 / 1964 ntokens=63432.1, nsentences=2069.75, nvalidtokens=227712, invalid_nsentences=0, tokens_perc=3.624, sentences_perc=1, loss=2.348, glat_acc=0.63, glat_keep=0.011, dag_nll=2.348, length_nll=5.547, dag=2.348, length=5.547, bleu=0, wps=55981.7, ups=0.88, wpb=63432.1, bsz=2069.8, num_updates=276100, lr=9.51561e-05, gnorm=1.819, clip=100, loss_scale=4096, train_wall=111, gb_free=36, wall=322824
2023-05-23 15:23:46 | INFO | train_inner | epoch 141:   1261 / 1964 ntokens=63781.1, nsentences=1932.89, nvalidtokens=226399, invalid_nsentences=0, tokens_perc=3.574, sentences_perc=1, loss=2.394, glat_acc=0.622, glat_keep=0.011, dag_nll=2.394, length_nll=5.545, dag=2.394, length=5.545, bleu=0, wps=57029.5, ups=0.89, wpb=63781.1, bsz=1932.9, num_updates=276200, lr=9.51389e-05, gnorm=1.777, clip=100, loss_scale=4096, train_wall=109, gb_free=45.1, wall=322936
2023-05-23 15:25:40 | INFO | train_inner | epoch 141:   1361 / 1964 ntokens=62655.2, nsentences=1940.58, nvalidtokens=231785, invalid_nsentences=0, tokens_perc=3.742, sentences_perc=1, loss=2.309, glat_acc=0.637, glat_keep=0.01, dag_nll=2.309, length_nll=5.546, dag=2.309, length=5.546, bleu=0, wps=54577.8, ups=0.87, wpb=62655.2, bsz=1940.6, num_updates=276300, lr=9.51217e-05, gnorm=1.808, clip=100, loss_scale=4096, train_wall=112, gb_free=40.2, wall=323050
2023-05-23 15:27:35 | INFO | train_inner | epoch 141:   1461 / 1964 ntokens=62801.9, nsentences=1829.91, nvalidtokens=229145, invalid_nsentences=0, tokens_perc=3.689, sentences_perc=1, loss=2.389, glat_acc=0.624, glat_keep=0.011, dag_nll=2.389, length_nll=5.545, dag=2.389, length=5.545, bleu=0, wps=54686.6, ups=0.87, wpb=62801.9, bsz=1829.9, num_updates=276400, lr=9.51045e-05, gnorm=1.813, clip=100, loss_scale=4096, train_wall=112, gb_free=36, wall=323165
2023-05-23 15:28:17 | INFO | fairseq.trainer | NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 2048.0
2023-05-23 15:29:30 | INFO | train_inner | epoch 141:   1562 / 1964 ntokens=62524.5, nsentences=2024.61, nvalidtokens=228430, invalid_nsentences=0, tokens_perc=3.691, sentences_perc=1, loss=2.368, glat_acc=0.626, glat_keep=0.011, dag_nll=2.368, length_nll=5.543, dag=2.368, length=5.543, bleu=0, wps=54661.6, ups=0.87, wpb=62524.5, bsz=2024.6, num_updates=276500, lr=9.50873e-05, gnorm=1.753, clip=100, loss_scale=2048, train_wall=112, gb_free=41.2, wall=323280
2023-05-23 15:31:25 | INFO | train_inner | epoch 141:   1662 / 1964 ntokens=61952, nsentences=1890.37, nvalidtokens=230334, invalid_nsentences=0, tokens_perc=3.765, sentences_perc=1, loss=2.44, glat_acc=0.615, glat_keep=0.011, dag_nll=2.44, length_nll=5.546, dag=2.44, length=5.546, bleu=0, wps=53935.9, ups=0.87, wpb=61952, bsz=1890.4, num_updates=276600, lr=9.50701e-05, gnorm=1.994, clip=100, loss_scale=2048, train_wall=112, gb_free=31, wall=323394
2023-05-23 15:33:21 | INFO | train_inner | epoch 141:   1762 / 1964 ntokens=62568.8, nsentences=2037.36, nvalidtokens=230812, invalid_nsentences=0, tokens_perc=3.732, sentences_perc=1, loss=2.351, glat_acc=0.63, glat_keep=0.011, dag_nll=2.351, length_nll=5.543, dag=2.351, length=5.543, bleu=0, wps=53930.7, ups=0.86, wpb=62568.8, bsz=2037.4, num_updates=276700, lr=9.50529e-05, gnorm=1.907, clip=100, loss_scale=2048, train_wall=114, gb_free=41, wall=323510
2023-05-23 15:35:20 | INFO | train_inner | epoch 141:   1862 / 1964 ntokens=62526.2, nsentences=1748.68, nvalidtokens=233405, invalid_nsentences=0, tokens_perc=3.772, sentences_perc=1, loss=2.374, glat_acc=0.626, glat_keep=0.011, dag_nll=2.374, length_nll=5.544, dag=2.374, length=5.544, bleu=0, wps=52445.6, ups=0.84, wpb=62526.2, bsz=1748.7, num_updates=276800, lr=9.50357e-05, gnorm=1.978, clip=100, loss_scale=2048, train_wall=117, gb_free=31.5, wall=323630
fairseq plugins loaded...
fairseq plugins loaded...
2023-05-23 15:37:14 | INFO | train_inner | epoch 141:   1962 / 1964 ntokens=62287.8, nsentences=2333.56, nvalidtokens=235691, invalid_nsentences=0, tokens_perc=3.834, sentences_perc=1, loss=2.193, glat_acc=0.655, glat_keep=0.009, dag_nll=2.193, length_nll=5.545, dag=2.193, length=5.545, bleu=0, wps=54394.1, ups=0.87, wpb=62287.8, bsz=2333.6, num_updates=276900, lr=9.50186e-05, gnorm=1.884, clip=100, loss_scale=2048, train_wall=112, gb_free=34.6, wall=323744
2023-05-23 15:37:16 | INFO | fairseq_cli.train | begin validation on "valid" subset
2023-05-23 15:37:22 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Sie konnten den Schaden teilweise begrenzen .
2023-05-23 15:37:22 | INFO | fs_plugins.tasks.translation_dat | example reference: Sie konnten die Schäden teilweise begrenzen .
2023-05-23 15:37:23 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Es half auch dazu beigetragen , der Serie ein Plattenpublikum zu geben .
2023-05-23 15:37:23 | INFO | fs_plugins.tasks.translation_dat | example reference: Damit erreichte die Serie einen Zuschauerrekord in jener Zeit .
2023-05-23 15:37:23 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Das einzig positive Punkt das Folterverbot in Artikel 36 .
2023-05-23 15:37:23 | INFO | fs_plugins.tasks.translation_dat | example reference: Ihrer Meinung nach ist der einzige positive Punkt das Verbot der Folter durch Artikel 36.
2023-05-23 15:37:23 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Der Vater des Kindes , Andrian Nikolaev war auch ein Kosmonaut .
2023-05-23 15:37:23 | INFO | fs_plugins.tasks.translation_dat | example reference: Der Vater des Kindes , Andrijan Nikolajew , war ebenfalls Kosmonaut .
fairseq plugins loaded...
fairseq plugins loaded...
2023-05-23 15:37:24 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Ich glaube , dass in Europa und auch in den USA ein großes Verständnis für die deutsche Position .
2023-05-23 15:37:24 | INFO | fs_plugins.tasks.translation_dat | example reference: Ich glaube , in Europa und sogar in den USA herrscht großes Verständnis für die deutsche Position .
2023-05-23 15:37:24 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Wie Sie wissen , ist Azcarraga Andrade der Haupttaktionschafter der Hotelkette Posadas .
2023-05-23 15:37:24 | INFO | fs_plugins.tasks.translation_dat | example reference: Wie Sie wissen , ist Azcárraga Andrade der Hauptaktionär der Hotelkette Posadas .
2023-05-23 15:37:25 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Ich glaube , dass sie in dieser Rolle noch finden , aber sie sind auf dem Weg zu einer " normaleren " Außenpolitik .
2023-05-23 15:37:25 | INFO | fs_plugins.tasks.translation_dat | example reference: Ich glaube , sie tasten sich noch an ihre Rolle heran , aber sie sind auf dem Weg zu einer " normaleren " Außenpolitik .
2023-05-23 15:37:25 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: " Das Internet entstand als Mittel der Suche nach Informationen , aber mit dem Auftreten dieser Websites haben sich die Spielregeln des Spiels " , sagt er .
2023-05-23 15:37:25 | INFO | fs_plugins.tasks.translation_dat | example reference: " Das Internet entstand als Medium für die Suche nach Informationen ; mit dem Erscheinen dieser Seiten haben sich die Spielregeln aber geändert " , sagt sie .
2023-05-23 15:37:25 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Die Kausalität des Aufstiegs der faschistischen und kommunistischen Regimes sollte daher in der fehlgeleiteten Liberalisierung des Wirtschaftssystems im 19. und 20. Jahrhundert gesucht werden .
2023-05-23 15:37:25 | INFO | fs_plugins.tasks.translation_dat | example reference: Die Ursache für das Aufkommen des Faschismus und des Kommunismus müssen wir daher in der rücksichtslosen Liberalisierung der Wirtschaftssysteme im 19. und 20. Jahrhundert suchen .
2023-05-23 15:37:26 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Laut dem Robert Koch Instituts hat sich die Zahl der Syphilis @-@ Infektionen von 1.697 Fällen im Jahr 2001 bis 3.698 Fällen im Jahr 2011 .
2023-05-23 15:37:26 | INFO | fs_plugins.tasks.translation_dat | example reference: Die Zahl der Syphilis-Infektionen hat sich laut dem Robert-Koch-Institut von 1697 Erkrankungen im Jahr 2001 auf 3698 Erkrankungen im Jahr 2011 mehr als verdoppelt .
2023-05-23 15:37:26 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Eine Reihe von Menschen hoffen außerdem leidenft , dass einige gefunden werden , weil der geringste Unterschied könnte eine Tür zu einer " neuen Physik " öffnen und bestimmte Löcher im Modell .
2023-05-23 15:37:26 | INFO | fs_plugins.tasks.translation_dat | example reference: Es gibt übrigens zahlreiche Physiker , die sehnlichst hoffen , dass solche Abweichungen gefunden werden , da der geringste Unterschied eine Tür zu einer " neuen Physik " öffnen und einige Löcher des Modells stopfen könnte .
2023-05-23 15:37:26 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Für 8.733 km von Heidelberg im Nordwesten des Königreichs Swasiland liegt das Dorf Esitjeni , das auf die Stimmskraft des deutschen Chors stützt .
2023-05-23 15:37:26 | INFO | fs_plugins.tasks.translation_dat | example reference: Denn 8 733 Kilometer Luftlinie von Heidelberg entfernt , im Nordwesten des Königreichs Swasiland , gibt es das Dorf Esitjeni , das von der Stimmgewalt des deutschen Chores abhängt .
2023-05-23 15:37:27 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Mit Rolle wie " Canito ' Nieves , Pablo Alicea und der jungen Rolando Hourruitiner ersetzt die Spieler nach dem Trüameln der Mar del Plata Pan-American Games ausgesetzt wurden , haben wir Gold gegen alle Chancen .
2023-05-23 15:37:27 | INFO | fs_plugins.tasks.translation_dat | example reference: Mit Spielern wie " Canito " Nieves , Pablo Alicea und dem jungen Rolando Hourruitiner als Ersatz für die gesperrten Spieler wegen der Vorkommnisse auf den Panamerikanischen Spielen in Mar del Plata , gewannen wir entgegen jeder Prognose die Goldmedaille .
2023-05-23 15:37:27 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Correa unterstützte auch die Entscheidung , das Veto gegen Paraguay in UNASUR zumindest zu den nächsten Wahlen beizubehalten , und argumentiert , dass das Gremium " entschlossen sein und Opportunismus und einen mit Legmäßigkeit maskierten Staatsstreich nicht tolerieren " , weil dies " die Legitimität der Paraguayanischen Demokratie zerstören " .
2023-05-23 15:37:27 | INFO | fs_plugins.tasks.translation_dat | example reference: Correa unterstützte auch die Entscheidung , das Veto gegen Paraguay in der Unasur mindestens bis zu seinen nächsten Wahlen aufrechtzuerhalten , wobei er argumentierte , dass der Organismus " hart bleiben muss und keinem Opportunismus und keinem als legal verkleideten Staatsstreich stattgeben darf " , weil dadurch " die Legitimität der Demokratie in Paraguay zerstört wurde . "
2023-05-23 15:37:27 | INFO | fs_plugins.tasks.translation_dat | example hypothesis: Israels gegenwärtige israelische Ministerpräsident Netanjahu " der Falken " ist ein typisches Beispiel für einen faschistischen Politiker , der den internationalen Bankern loyal ist und alles tut , um den Krieg mit dem Iran zu zetfachen , der aufgrund seiner Mitgliedschaft in der ShanghaOrganisation für Zusammenarbeit ( China , Indien , Russland , Pakistan , ... ) zu einer größeren Bedrohung eines globalen Konflikts und durch seine Kontrolle der Hormuz , wo 20 % des Öls der Welt ( der Kanal ist nur 2 Meilen breit ) , zur Zerstörung der Weltwirtschaft führen würde .
2023-05-23 15:37:27 | INFO | fs_plugins.tasks.translation_dat | example reference: Der derzeitige Premierminister Israels , der Falke Netanjahu , ist ein typisches Beispiel eines faschismusanfälligen , den internationalen Bankern loyal ergebenen Politikers , der alles dafür tut , um einen Krieg mit dem Iran zu entfachen , welcher sich angesichts der Mitgliedschaft Irans in der Schanghaier Organisation für Zusammenarbeit ( China , Indien , Russland , Pakistan ... ) , rasch zu einem globalen Konflikt ausweiten könnte , und bei dem es wegen der Kontrolle Irans über die nur 2 Meilen breite Straße von Hormus , über die 20 % der weltweiten Erdöllieferungen laufen , zu einer Zerstörung der Weltwirtschaft kommen könnte .
2023-05-23 15:37:27 | INFO | valid | epoch 141 | valid on 'valid' subset | ntokens 5837.24 | nsentences 206.241 | nvalidtokens 20568.9 | invalid_nsentences 0 | tokens_perc 3.519 | sentences_perc 1 | loss 2.371 | glat_acc 0.633 | glat_keep 0.01 | dag_nll 2.371 | length_nll 5.545 | dag 2.371 | length 5.545 | bleu 23.32 | wps 15438.6 | wpb 5690.1 | bsz 199.8 | num_updates 276902 | best_bleu 24.18
2023-05-23 15:37:27 | INFO | fairseq.checkpoint_utils | Preparing to save checkpoint for epoch 141 @ 276902 updates
2023-05-23 15:37:27 | INFO | fairseq.trainer | Saving checkpoint to /home/cs/yangyuchen/yushengliao/NMT/checkpoints/ende_dag_raw/checkpoint141.pt
2023-05-23 15:37:29 | WARNING | fs_plugins.tasks.translation_dat | 4,581 samples have invalid sizes and will be skipped, max_positions=(128, 1024) and filter_ratio=2, first few sample ids=[2840886, 2840889, 2192638, 3727174, 2048466, 3558056, 3823897, 3910352, 3470167, 3402828]
2023-05-23 15:37:30 | INFO | fairseq.trainer | Finished saving checkpoint to /home/cs/yangyuchen/yushengliao/NMT/checkpoints/ende_dag_raw/checkpoint141.pt
2023-05-23 15:37:31 | INFO | fairseq.checkpoint_utils | Saved checkpoint /home/cs/yangyuchen/yushengliao/NMT/checkpoints/ende_dag_raw/checkpoint141.pt (epoch 141 @ 276902 updates, score 23.32) (writing took 3.5476756650023162 seconds)
2023-05-23 15:37:31 | INFO | fairseq_cli.train | end of epoch 141 (average epoch stats below)
2023-05-23 15:37:31 | INFO | train | epoch 141 | ntokens 62871.4 | nsentences 2015.37 | nvalidtokens 231397 | invalid_nsentences 0 | tokens_perc 3.719 | sentences_perc 1 | loss 2.319 | glat_acc 0.634 | glat_keep 0.01 | dag_nll 2.319 | length_nll 5.545 | dag 2.319 | length 5.545 | bleu 0 | wps 54086.3 | ups 0.86 | wpb 62871.4 | bsz 2015.4 | num_updates 276902 | lr 9.50182e-05 | gnorm 1.762 | clip 100 | loss_scale 2048 | train_wall 2210 | gb_free 29.1 | wall 323761
2023-05-23 15:37:33 | WARNING | fs_plugins.tasks.translation_dat | 4,581 samples have invalid sizes and will be skipped, max_positions=(128, 1024) and filter_ratio=2, first few sample ids=[2840886, 2840889, 2192638, 3727174, 2048466, 3558056, 3823897, 3910352, 3470167, 3402828]
2023-05-23 15:37:34 | INFO | fairseq.data.iterators | grouped total_num_itrs = 1964
2023-05-23 15:37:34 | INFO | fairseq.trainer | begin training epoch 142
2023-05-23 15:37:34 | INFO | fairseq_cli.train | Start iterating over samples
2023-05-23 15:39:35 | INFO | train_inner | epoch 142:     98 / 1964 ntokens=62923.6, nsentences=2241.01, nvalidtokens=232107, invalid_nsentences=0, tokens_perc=3.729, sentences_perc=1, loss=2.218, glat_acc=0.65, glat_keep=0.01, dag_nll=2.218, length_nll=5.543, dag=2.218, length=5.543, bleu=0, wps=44770.6, ups=0.71, wpb=62923.6, bsz=2241, num_updates=277000, lr=9.50014e-05, gnorm=1.865, clip=100, loss_scale=2048, train_wall=113, gb_free=40.2, wall=323885
2023-05-23 15:41:31 | INFO | train_inner | epoch 142:    198 / 1964 ntokens=62906, nsentences=1998.72, nvalidtokens=233115, invalid_nsentences=0, tokens_perc=3.744, sentences_perc=1, loss=2.315, glat_acc=0.634, glat_keep=0.01, dag_nll=2.315, length_nll=5.545, dag=2.315, length=5.545, bleu=0, wps=54145.5, ups=0.86, wpb=62906, bsz=1998.7, num_updates=277100, lr=9.49843e-05, gnorm=1.943, clip=100, loss_scale=2048, train_wall=114, gb_free=37.5, wall=324001
2023-05-23 15:43:25 | INFO | train_inner | epoch 142:    298 / 1964 ntokens=62921.2, nsentences=2083, nvalidtokens=233577, invalid_nsentences=0, tokens_perc=3.757, sentences_perc=1, loss=2.274, glat_acc=0.642, glat_keep=0.01, dag_nll=2.274, length_nll=5.545, dag=2.274, length=5.545, bleu=0, wps=55228.6, ups=0.88, wpb=62921.2, bsz=2083, num_updates=277200, lr=9.49671e-05, gnorm=1.922, clip=100, loss_scale=2048, train_wall=111, gb_free=38, wall=324115
2023-05-23 15:45:20 | INFO | train_inner | epoch 142:    398 / 1964 ntokens=63804.9, nsentences=1994.9, nvalidtokens=234508, invalid_nsentences=0, tokens_perc=3.707, sentences_perc=1, loss=2.214, glat_acc=0.65, glat_keep=0.01, dag_nll=2.214, length_nll=5.545, dag=2.214, length=5.545, bleu=0, wps=55552.4, ups=0.87, wpb=63804.9, bsz=1994.9, num_updates=277300, lr=9.495e-05, gnorm=1.857, clip=100, loss_scale=2048, train_wall=112, gb_free=38.6, wall=324230
2023-05-23 15:47:16 | INFO | train_inner | epoch 142:    498 / 1964 ntokens=63317.3, nsentences=2006.21, nvalidtokens=233460, invalid_nsentences=0, tokens_perc=3.714, sentences_perc=1, loss=2.262, glat_acc=0.644, glat_keep=0.01, dag_nll=2.262, length_nll=5.544, dag=2.262, length=5.544, bleu=0, wps=54280.4, ups=0.86, wpb=63317.3, bsz=2006.2, num_updates=277400, lr=9.49329e-05, gnorm=1.849, clip=100, loss_scale=2048, train_wall=114, gb_free=45.9, wall=324346
2023-05-23 15:49:12 | INFO | train_inner | epoch 142:    598 / 1964 ntokens=63117.4, nsentences=1931.94, nvalidtokens=232004, invalid_nsentences=0, tokens_perc=3.707, sentences_perc=1, loss=2.291, glat_acc=0.637, glat_keep=0.01, dag_nll=2.291, length_nll=5.547, dag=2.291, length=5.547, bleu=0, wps=54712.6, ups=0.87, wpb=63117.4, bsz=1931.9, num_updates=277500, lr=9.49158e-05, gnorm=1.882, clip=100, loss_scale=2048, train_wall=113, gb_free=42.8, wall=324462
2023-05-23 15:51:07 | INFO | train_inner | epoch 142:    698 / 1964 ntokens=63197.2, nsentences=2010.2, nvalidtokens=233443, invalid_nsentences=0, tokens_perc=3.72, sentences_perc=1, loss=2.256, glat_acc=0.645, glat_keep=0.01, dag_nll=2.256, length_nll=5.542, dag=2.256, length=5.542, bleu=0, wps=54707.5, ups=0.87, wpb=63197.2, bsz=2010.2, num_updates=277600, lr=9.48987e-05, gnorm=1.858, clip=100, loss_scale=2048, train_wall=113, gb_free=41.6, wall=324577

@BlueZeros I have found that the option skip-invalid-size-inputs-valid-test is neccessary because there is a sample in the validation set where the target length is much longer than the source length (target_length / source_length > 4). I didn't encounter this problem because the results in the paper were obtained using the old codes. The old codes include the <bos> and <eos> tokens when calculating the upsampling, resulting in a larger DAG size.

To fix this issue, I believe we can replace --upsample-base source with --upsample-base source_old, which uses the old method for upsampling. However, I'm not certain if there are other differences between the two codebases. I am currently running a trial to reproduce the results. You can either wait for my results or try it yourself. If this issue is confirmed, I'll update the script accordingly. Thank you!

Thank you again! I also found that only the experiment on WMT'14 EN-DE raw data requires the option skip-invalid-size-inputs-valid-test and gets lower scores. There are no issues with WMT'14 DE-EN data, and the final result is also normal. Hope this can help you.

I confirmed that using --upsample-base source_old is better than --upsample-base source on WMT'14 En-De raw data, which gave a 26.05 BLEU on test set (close to the reported 26.14).

I will update the training scripts. Thank you!