Accuracy of trained model?

Question

Accuracy of trained model?

Closed this issue 6 years ago · 12 comments

I've trained the mlconv model using train_embed.sh, with hyperparameters in the script.
The training ended without error in 5 epochs.

But I cant reproduce the F0.5 score in the paper.
My model achieved F0.5 score of 0.18.
(far from reported F0.5 of 0.45)
The result (output.tok.txt) was also terrible.

Has anyone suffered this problem?

Here's my training log

`

set -e
source ../paths.sh
++++ dirname ../paths.sh
+++ cd ..
+++ pwd
++ BASE_DIR=/home/account/torch_gec/mlconvgec2018
++ DATA_DIR=/home/account/torch_gec/mlconvgec2018/data
++ MODEL_DIR=/home/account/torch_gec/mlconvgec2018/models
++ SCRIPTS_DIR=/home/account/torch_gec/mlconvgec2018/scripts
++ SOFTWARE_DIR=/home/account/torch_gec/mlconvgec2018/software
FAIRSEQPY=/home/account/torch_gec/mlconvgec2018/software/fairseq-py
EMBED_PATH=/home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec
'[' '!' -f /home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec ']'
SEED=1000
DATA_BIN_DIR=processed/bin
OUT_DIR=models/mlconv_embed/model1000/
mkdir -p models/mlconv_embed/model1000/
PYTHONPATH=/home/account/torch_gec/mlconvgec2018/software/fairseq-py:
CUDA_VISIBLE_DEVICES=0
python3.5 /home/account/torch_gec/mlconvgec2018/software/fairseq-py/train.py --save-dir models/mlconv_embed/model1000/ --encoder-embed-dim 500 --encoder-embed-path /home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec --decoder-embed-dim 500 --decoder-embed-path /home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec --decoder-out-embed-dim 500 --dropout 0.2 --clip-norm 0.1 --lr 0.25 --min-lr 1e-4 --encoder-layers '[(1024,3)] * 7' --decoder-layers '[(1024,3)] * 7' --momentum 0.99 --max-epoch 100 --batch-size 32 --seed 1000 processed/bin
Namespace(arch='fconv', batch_size=32, clip_norm=0.1, data='processed/bin', decoder_attention='True', decoder_embed_dim=500, decoder_embed_path='/home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec', decoder_layers='[(1024,3)] * 7', decoder_out_embed_dim=500, dropout=0.2, encoder_embed_dim=500, encoder_embed_path='/home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec', encoder_layers='[(1024,3)] * 7', force_anneal=0, label_smoothing=0, log_interval=1000, lr=0.25, lrshrink=0.1, max_epoch=100, max_positions=1024, max_tokens=0, min_lr=0.0001, model='fconv', momentum=0.99, no_epoch_checkpoints=False, no_progress_bar=False, no_save=False, restore_file='checkpoint_last.pt', sample_without_replacement=0, save_dir='models/mlconv_embed/model1000/', save_interval=-1, seed=1000, source_lang=None, target_lang=None, test_batch_size=32, test_subset='test', train_subset='train', valid_batch_size=32, valid_script=None, valid_subset='valid', weight_decay=0.0, workers=1)
| [src] dictionary: 30004 types
| [trg] dictionary: 30004 types
| processed/bin valid 5448 examples
| processed/bin train 1298763 examples
| processed/bin test 5448 examples
| using 1 GPUs (with max tokens per GPU = None)
| model fconv
| Loading encoder embeddings from /home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec
| Found 25760/30004 types in embeddings file.
| Loading decoder embeddings from /home/account/torch_gec/mlconvgec2018/models/embeddings/wiki_model.vec
| Found 25678/30004 types in embeddings file.
training epoch: 1
| epoch 001: 0%| | 0/40587 [00:00<?, ?it/s]/home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/models/fconv.py:172: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
x = F.softmax(x.view(sz[0] * sz[1], sz[2]))
| epoch 001 | train loss 2.77 | train ppl 6.82 | s/checkpoint 4717 | words/s 5024 | words/batch 584 | bsz 32 | lr 0.250000 | clip 100% | gnorm 0.7062
| epoch 001 | valid on 'valid' subset: 0%| | 0/2738 [00:00<?, ?it/s]/home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/utils.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
return Variable(tensor, volatile=volatile)
/home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/multiprocessing_trainer.py:213: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
return loss.data[0]
| epoch 001 | valid on 'valid' subset | valid loss 10.83 | valid ppl 1823.06
| epoch 001 | saving checkpoint
| epoch 001 | saving best checkpoint
| epoch 001 | saving last checkpoint
training epoch: 2
| epoch 002 | train loss 2.12 | train ppl 4.35 | s/checkpoint 4726 | words/s 5014 | words/batch 584 | bsz 32 | lr 0.250000 | clip 100% | gnorm 0.4339
| epoch 002 | valid on 'valid' subset | valid loss 12.48 | valid ppl 5704.88
| epoch 002 | saving checkpoint
| epoch 002 | saving last checkpoint
training epoch: 3
| epoch 003 | train loss 1.80 | train ppl 3.47 | s/checkpoint 4731 | words/s 5009 | words/batch 584 | bsz 32 | lr 0.025000 | clip 100% | gnorm 0.3773
| epoch 003 | valid on 'valid' subset | valid loss 11.59 | valid ppl 3079.46
| epoch 003 | saving checkpoint
| epoch 003 | saving last checkpoint
training epoch: 4
| epoch 004 | train loss 1.73 | train ppl 3.31 | s/checkpoint 4734 | words/s 5006 | words/batch 584 | bsz 32 | lr 0.002500 | clip 100% | gnorm 0.3848
| epoch 004 | valid on 'valid' subset | valid loss 10.10 | valid ppl 1096.12
| epoch 004 | saving checkpoint
| epoch 004 | saving best checkpoint
| epoch 004 | saving last checkpoint
training epoch: 5
| epoch 005 | train loss 1.72 | train ppl 3.29 | s/checkpoint 4742 | words/s 4997 | words/batch 584 | bsz 32 | lr 0.002500 | clip 100% | gnorm 0.3883
| epoch 005 | valid on 'valid' subset | valid loss 12.75 | valid ppl 6908.31
| epoch 005 | saving checkpoint
| epoch 005 | saving last checkpoint
training epoch: 6
| epoch 006 | train loss 1.71 | train ppl 3.28 | s/checkpoint 4744 | words/s 4995 | words/batch 584 | bsz 32 | lr 0.000250 | clip 100% | gnorm 0.3894
| epoch 006 | valid on 'valid' subset | valid loss 10.83 | valid ppl 1823.29
| epoch 006 | saving checkpoint
| epoch 006 | saving last checkpoint
| done training in 28644.4 seconds
/home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/utils.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
return Variable(tensor, volatile=volatile)
/home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/models/fconv.py:172: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
x = F.softmax(x.view(sz[0] * sz[1], sz[2]))
/home/account/torch_gec/mlconvgec2018/software/fairseq-py/fairseq/sequence_generator.py:357: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
probs = F.softmax(decoder_out[:, -1, :]).data
| Test on test with beam=1: BLEU4 = 79.83, 90.3/82.8/76.5/70.9 (BP=1.000, ratio=0.991, syslen=148117, reflen=146808)
| Test on test with beam=5: BLEU4 = 80.41, 91.0/83.4/77.1/71.5 (BP=1.000, ratio=0.992, syslen=147983, reflen=146808)
| Test on test with beam=10: BLEU4 = 80.58, 91.1/83.6/77.3/71.7 (BP=1.000, ratio=0.991, syslen=148081, reflen=146808)
| Test on test with beam=20: BLEU4 = 80.64, 91.1/83.6/77.4/71.8 (BP=1.000, ratio=0.991, syslen=148140, reflen=146808)
`

Answer 1 · 2019-05-12T21:36:20.000Z

What training data did you use and how did you process your training data? On Thu, May 9, 2019 at 8:32 PM theincluder <notifications@github.com<mailto:notifications@github.com>> wrote: I've trained the mlconv model using train_embed.sh, with hyperparameters in the script. The training ended without error in 5 epochs. But I cant reproduce the F0.5 score in the paper. My model achieved F0.5 score of 0.18. (far from reported F0.5 of 0.45) The result (output.tok.txt) was also terrible. Has anyone suffered this problem? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#22>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAE46MGF7WX6WUWQBGUEFPTPUQKV7ANCNFSM4HL2B2GA>. ***DISCLAIMER*** The sender of this email is an alumnus of National University of Singapore (NUS). Kindly note that NUS is not responsible for the contents of this email, and views and opinions expressed are solely the sender's.

Answer 2 · 2019-05-12T23:00:33.000Z

I downloaded two training datasets (nucle and lang8v2) and ran prepare_data.sh and preprocess.sh
They ran without error.

Answer 3 · 2019-05-13T02:01:05.000Z

Can you share your train log?

Answer 4 · 2019-05-13T02:02:34.000Z

Can you share your train log?

I have one in the priginal post.
I"ll upload another after another training.

Answer 5 · 2019-05-13T02:07:35.000Z

Sorry, I missed the log file.

You seem to be using a newer version of Pytorch than what we used for this project. We used an old fork of Fairseq (https://github.com/shamilcm/fairseq-py) which required Pytorch 0.2.0 compiled from source.

If you want to use a later version of Fairseq (v 0.5), you can use the scripts in this fairseq0.5 branch of our repository (https://github.com/nusnlp/mlconvgec2018/tree/fairseq0.5). This has been tested to work with Pytorch 0.4.1 (no need for compilation from source, can be installed via conda)

Answer 6 · 2019-05-13T02:13:06.000Z

Sorry, I missed the log file.

You seem to be using a newer version of Pytorch than what we used for this project. We used an old fork of Fairseq (https://github.com/shamilcm/fairseq-py) which required Pytorch 0.2.0 compiled from source.

If you want to use a later version of Fairseq (v 0.5), you can use the scripts in this fairseq0.5 branch of our repository (https://github.com/nusnlp/mlconvgec2018/tree/fairseq0.5). This has been tested to work with Pytorch 0.4.1 (no need for compilation from source, can be installed via conda)

Thank you for your help!
I have had trouble running the original branch.
(I had to test multiple pytorch and fairseq versions and modify some codes)

I'll test the new version and post the result.

Answer 7 · 2019-05-14T07:39:40.000Z

Hahaha I found the problem and it was a trivial mistake.

I should have run training/run_trained_model.sh , but I ran run.sh instead.

Sorry for bugging you for my mistake.

(anyway, the fairseq0.5 branch worked well)

Answer 8 · 2020-05-17T09:11:51.000Z

@theincluder @shamilcm @gurunath-p

i am having trouble getting the m2 score. i ran the
./run_trained_model.sh

Got the output.bpe.nbest.txt, output.bpe.txt and output.tok.txt.

But i could not get the m2 score.

Note: i did not train the reranker. I did it without it. Can you tell me what i am missing? Any help would be appreciated.

Answer 9 · 2020-05-17T10:26:55.000Z

If you have decoded the CoNLL-2014 test set, you need to get the reference M2 file from https://www.comp.nus.edu.sg/~nlp/conll14st.html. Download the annotated test data. The reference M2 file for the competition is the official-2014.combined.m2 file in the no-alt/ directory. Download the official M2 scorer from the same page. Run m2 scorer using ./m2scorer output.tok.txt /path/to/official-2014.combined.m2

Answer 10 · 2020-05-17T12:08:24.000Z

@shamilcm
thanks for the link. i ran the m2 scorer, but the problem was difference in output.tok.txt and conll14-test.m2.

So, i followed this another issue
#2

so i followed as u told in the issue by :

Using interactive.py instead of generate.py with a --interactive.
I tried to preprocess again with --testpref as the conll14-test.tok.src.Now the error is that i dont have conll14-test.tok.trgt target file.

I would be so thankful if you could help me here. Thanks in advance

Answer 11 · 2020-06-18T15:53:02.000Z

@shamilcm
I have another doubt regarding the accuracy of model.
I got the results one model.

Can you describe more about some other wiki corpora described in the paper to bolster the F0.5 score?
Could you also share how you created the ensemble with different initializations? I would like to know that too.

Thanks a lot in advance.

Answer 12 · 2020-06-19T04:32:46.000Z

In the paper, Wikipedia corpus was used to train fast text embeddings to initialize the word embeddings of the model before training. Ensembling was done by training 4 separate models with different random seeds. All four models were simultaneously used during deciding. Fairseq generate can take in multiple models As arguments for ensemble decoding

…

On Thu, 18 Jun 2020 at 9:23 PM, NikhilCherian ***@***.***> wrote: @shamilcm <https://github.com/shamilcm> I have another doubt regarding the accuracy of model. I got the results one model. [image: image] <https://user-images.githubusercontent.com/11813426/85042979-260ac800-b18c-11ea-9ad8-5e160ce4e701.png> Can you describe more about some other wiki corpora described in the paper to bolster the F0.5 score? Could you also share how you created the ensemble with different initializations? I would like to know that too. Thanks a lot in advance. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAE46MDCDX6FJUOL2QBCKMDRXIZ63ANCNFSM4HL2B2GA> .