nusnlp/mlconvgec2018

Accuracy of trained model?

Closed this issue · 12 comments

I've trained the mlconv model using train_embed.sh, with hyperparameters in the script.
The training ended without error in 5 epochs.

But I cant reproduce the F0.5 score in the paper.
My model achieved F0.5 score of 0.18.
(far from reported F0.5 of 0.45)
The result (output.tok.txt) was also terrible.

Has anyone suffered this problem?

Here's my training log

`

  • set -e
  • source ../paths.sh
    ++++ dirname ../paths.sh
    +++ cd ..
    +++ pwd
    ++ BASE_DIR=/home/account/torch_gec/mlconvgec2018
    ++ DATA_DIR=/home/account/torch_gec/mlconvgec2018/data
    ++ MODEL_DIR=/home/account/torch_gec/mlconvgec2018/models
    ++ SCRIPTS_DIR=/home/account/torch_gec/mlconvgec2018/scripts
    ++ SOFTWARE_DIR=/home/account/torch_gec/mlconvgec2018/software
  • FAIRSEQPY=/home/account/torch_gec/mlconvgec2018/software/fairseq-py
  • EMBED_PATH=/home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec
  • '[' '!' -f /home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec ']'
  • SEED=1000
  • DATA_BIN_DIR=processed/bin
  • OUT_DIR=models/mlconv_embed/model1000/
  • mkdir -p models/mlconv_embed/model1000/
  • PYTHONPATH=/home/account/torch_gec/mlconvgec2018/software/fairseq-py:
  • CUDA_VISIBLE_DEVICES=0
  • python3.5 /home/account/torch_gec/mlconvgec2018/software/fairseq-py/train.py --save-dir models/mlconv_embed/model1000/ --encoder-embed-dim 500 --encoder-embed-path /home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec --decoder-embed-dim 500 --decoder-embed-path /home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec --decoder-out-embed-dim 500 --dropout 0.2 --clip-norm 0.1 --lr 0.25 --min-lr 1e-4 --encoder-layers '[(1024,3)] * 7' --decoder-layers '[(1024,3)] * 7' --momentum 0.99 --max-epoch 100 --batch-size 32 --seed 1000 processed/bin
    Namespace(arch='fconv', batch_size=32, clip_norm=0.1, data='processed/bin', decoder_attention='True', decoder_embed_dim=500, decoder_embed_path='/home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec', decoder_layers='[(1024,3)] * 7', decoder_out_embed_dim=500, dropout=0.2, encoder_embed_dim=500, encoder_embed_path='/home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec', encoder_layers='[(1024,3)] * 7', force_anneal=0, label_smoothing=0, log_interval=1000, lr=0.25, lrshrink=0.1, max_epoch=100, max_positions=1024, max_tokens=0, min_lr=0.0001, model='fconv', momentum=0.99, no_epoch_checkpoints=False, no_progress_bar=False, no_save=False, restore_file='checkpoint_last.pt', sample_without_replacement=0, save_dir='models/mlconv_embed/model1000/', save_interval=-1, seed=1000, source_lang=None, target_lang=None, test_batch_size=32, test_subset='test', train_subset='train', valid_batch_size=32, valid_script=None, valid_subset='valid', weight_decay=0.0, workers=1)
    | [src] dictionary: 30004 types
    | [trg] dictionary: 30004 types
    | processed/bin valid 5448 examples
    | processed/bin train 1298763 examples
    | processed/bin test 5448 examples
    | using 1 GPUs (with max tokens per GPU = None)
    | model fconv
    | Loading encoder embeddings from /home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec
    | Found 25760/30004 types in embeddings file.
    | Loading decoder embeddings from /home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec
    | Found 25678/30004 types in embeddings file.
    training epoch: 1
    | epoch 001: 0%| | 0/40587 [00:00<?, ?it/s]/home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/models/fconv.py:172: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
    x = F.softmax(x.view(sz[0] * sz[1], sz[2]))
    | epoch 001 | train loss 2.77 | train ppl 6.82 | s/checkpoint 4717 | words/s 5024 | words/batch 584 | bsz 32 | lr 0.250000 | clip 100% | gnorm 0.7062
    | epoch 001 | valid on 'valid' subset: 0%| | 0/2738 [00:00<?, ?it/s]/home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/utils.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    return Variable(tensor, volatile=volatile)
    /home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/multiprocessing_trainer.py:213: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
    return loss.data[0]
    | epoch 001 | valid on 'valid' subset | valid loss 10.83 | valid ppl 1823.06
    | epoch 001 | saving checkpoint
    | epoch 001 | saving best checkpoint
    | epoch 001 | saving last checkpoint
    training epoch: 2
    | epoch 002 | train loss 2.12 | train ppl 4.35 | s/checkpoint 4726 | words/s 5014 | words/batch 584 | bsz 32 | lr 0.250000 | clip 100% | gnorm 0.4339
    | epoch 002 | valid on 'valid' subset | valid loss 12.48 | valid ppl 5704.88
    | epoch 002 | saving checkpoint
    | epoch 002 | saving last checkpoint
    training epoch: 3
    | epoch 003 | train loss 1.80 | train ppl 3.47 | s/checkpoint 4731 | words/s 5009 | words/batch 584 | bsz 32 | lr 0.025000 | clip 100% | gnorm 0.3773
    | epoch 003 | valid on 'valid' subset | valid loss 11.59 | valid ppl 3079.46
    | epoch 003 | saving checkpoint
    | epoch 003 | saving last checkpoint
    training epoch: 4
    | epoch 004 | train loss 1.73 | train ppl 3.31 | s/checkpoint 4734 | words/s 5006 | words/batch 584 | bsz 32 | lr 0.002500 | clip 100% | gnorm 0.3848
    | epoch 004 | valid on 'valid' subset | valid loss 10.10 | valid ppl 1096.12
    | epoch 004 | saving checkpoint
    | epoch 004 | saving best checkpoint
    | epoch 004 | saving last checkpoint
    training epoch: 5
    | epoch 005 | train loss 1.72 | train ppl 3.29 | s/checkpoint 4742 | words/s 4997 | words/batch 584 | bsz 32 | lr 0.002500 | clip 100% | gnorm 0.3883
    | epoch 005 | valid on 'valid' subset | valid loss 12.75 | valid ppl 6908.31
    | epoch 005 | saving checkpoint
    | epoch 005 | saving last checkpoint
    training epoch: 6
    | epoch 006 | train loss 1.71 | train ppl 3.28 | s/checkpoint 4744 | words/s 4995 | words/batch 584 | bsz 32 | lr 0.000250 | clip 100% | gnorm 0.3894
    | epoch 006 | valid on 'valid' subset | valid loss 10.83 | valid ppl 1823.29
    | epoch 006 | saving checkpoint
    | epoch 006 | saving last checkpoint
    | done training in 28644.4 seconds
    /home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/utils.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
    return Variable(tensor, volatile=volatile)
    /home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/models/fconv.py:172: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
    x = F.softmax(x.view(sz[0] * sz[1], sz[2]))
    /home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/sequence_generator.py:357: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
    probs = F.softmax(decoder_out[:, -1, :]).data
    | Test on test with beam=1: BLEU4 = 79.83, 90.3/82.8/76.5/70.9 (BP=1.000, ratio=0.991, syslen=148117, reflen=146808)
    | Test on test with beam=5: BLEU4 = 80.41, 91.0/83.4/77.1/71.5 (BP=1.000, ratio=0.992, syslen=147983, reflen=146808)
    | Test on test with beam=10: BLEU4 = 80.58, 91.1/83.6/77.3/71.7 (BP=1.000, ratio=0.991, syslen=148081, reflen=146808)
    | Test on test with beam=20: BLEU4 = 80.64, 91.1/83.6/77.4/71.8 (BP=1.000, ratio=0.991, syslen=148140, reflen=146808)
    `

I downloaded two training datasets (nucle and lang8v2) and ran prepare_data.sh and preprocess.sh
They ran without error.

Can you share your train log?

Can you share your train log?

I have one in the priginal post.
I"ll upload another after another training.

Sorry, I missed the log file.

You seem to be using a newer version of Pytorch than what we used for this project. We used an old fork of Fairseq (https://github.com/shamilcm/fairseq-py) which required Pytorch 0.2.0 compiled from source.

If you want to use a later version of Fairseq (v 0.5), you can use the scripts in this fairseq0.5 branch of our repository (https://github.com/nusnlp/mlconvgec2018/tree/fairseq0.5). This has been tested to work with Pytorch 0.4.1 (no need for compilation from source, can be installed via conda)

Sorry, I missed the log file.

You seem to be using a newer version of Pytorch than what we used for this project. We used an old fork of Fairseq (https://github.com/shamilcm/fairseq-py) which required Pytorch 0.2.0 compiled from source.

If you want to use a later version of Fairseq (v 0.5), you can use the scripts in this fairseq0.5 branch of our repository (https://github.com/nusnlp/mlconvgec2018/tree/fairseq0.5). This has been tested to work with Pytorch 0.4.1 (no need for compilation from source, can be installed via conda)

Thank you for your help!
I have had trouble running the original branch.
(I had to test multiple pytorch and fairseq versions and modify some codes)

I'll test the new version and post the result.

Hahaha I found the problem and it was a trivial mistake.

I should have run training/run_trained_model.sh , but I ran run.sh instead.

Sorry for bugging you for my mistake.

(anyway, the fairseq0.5 branch worked well)

@theincluder @shamilcm @gurunath-p

i am having trouble getting the m2 score. i ran the
./run_trained_model.sh

Got the output.bpe.nbest.txt, output.bpe.txt and output.tok.txt.

But i could not get the m2 score.

Note: i did not train the reranker. I did it without it. Can you tell me what i am missing? Any help would be appreciated.

If you have decoded the CoNLL-2014 test set, you need to get the reference M2 file from https://www.comp.nus.edu.sg/~nlp/conll14st.html. Download the annotated test data. The reference M2 file for the competition is the official-2014.combined.m2 file in the no-alt/ directory. Download the official M2 scorer from the same page. Run m2 scorer using ./m2scorer output.tok.txt /path/to/official-2014.combined.m2

@shamilcm
thanks for the link. i ran the m2 scorer, but the problem was difference in output.tok.txt and conll14-test.m2.

image
So, i followed this another issue
#2

so i followed as u told in the issue by :

  1. Using interactive.py instead of generate.py with a --interactive.
  2. I tried to preprocess again with --testpref as the conll14-test.tok.src.Now the error is that i dont have conll14-test.tok.trgt target file.
    image

I would be so thankful if you could help me here. Thanks in advance

@shamilcm
I have another doubt regarding the accuracy of model.
I got the results one model.
image

Can you describe more about some other wiki corpora described in the paper to bolster the F0.5 score?
Could you also share how you created the ensemble with different initializations? I would like to know that too.

Thanks a lot in advance.