About testing models
renhongkai opened this issue · 12 comments
I use trained model testing conll2014-test, output four files are input.bpe.txt,output.bpe.nbest.txt,output.bpe.txt,output.tok.txt,Which file should I use to evaluate?What is the script?Thank you very much.
Use the output.tok.txt file. We use the M2scorer, which is the standard scorer used for evaluating the CoNLL 2014 shared task systems. Note that the evaluation on some sentences can take long time with the standard scorer.
Thank you very much. And I encountered a new problem : the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file 。the number of sentences in the output.tok.txt file is 5458, it is the same as validation set. Can you help me ?
I would be obliged if you could reply me at your earlist convenience.Thanks a lot in advance for your time and attention.
I used command : ./run.sh ./data/test/conll14st-test/conll14st-test.tok.src ./data/test/conll14st-test/output 0 ./training/models/mlconv/model1000
$SCRIPTS_DIR/apply_bpe.py -c $TRAINING_DIR/models/bpe_model/train.bpe.model < $input_file > $output_dir/input.bpe.txt
running fairseq on the test data --workers $threads $MODEL_DIR/data_bin < $output_dir/input.bpe.txt > $output_dir/output.bpe.nbest.txt
CUDA_VISIBLE_DEVICES=$device python3.5 $FAIRSEQPY/generate.py --no-progress-bar --path $models --beam $beam --nbest $beam --workers $threads $TRAINING_DIR/processed/bin < $output_dir/input.bpe.txt > $output_dir/output.bpe.nbest.txt --skip-invalid-size-inputs-valid-test
The flag --interactive is necessary while running fairseq on a custom input test set.
CUDA_VISIBLE_DEVICES=$device python3.5 $FAIRSEQPY/generate.py --no-progress-bar --path $models --beam $beam --nbest $beam --interactive --workers $threads $MODEL_DIR/data_bin < $output_dir/input.bpe.txt > $output_dir/output.bpe.nbest.txt
Thanks a lot in advance for your time and attention. I summed up the questions I encountered. I think this is a problem of version.
First, I use the sofeware directory download.sh file download fairseq-py (github: https://github.com/shamilcm/fairseq-py), but when I run the command “python setup.py build”,there are a error : cffi.error.VerificationError: CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 1.
so I change the fairseq-py version (**github : https://github.com/facebookresearch/fairseq-py.git**),this error did not appear.
But then I found a problem: the parameter does not correspond. When I run the command "./run.sh ./data/test/conll14st-test/conll14st-test.tok.src ./data/test/conll14st-test/output 0 ./training/models/mlconv/model1000",there are two error : **generate.py: error: unrecognized arguments: --interactive ,**so I remove the flag --interactive .
Another error is:Exception: Sample #10 has size (src=1, dst=1) but max size is 1022. Skip this example with --skip-invalid-size-inputs-valid-test,so I add the flag --skip-invalid-size-inputs-valid-test. Then the order can be successfully implemented ,but the the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file.
Can you help me ?Thank you very much.
Oh ok. The version of Fairseq-py in the download.sh script compiles only on previous version of PyTorch (PyTorch 0.2.0) that is compiled from source.
In the recent version of fairseq-py, the developers have replaced generate.py --interactive
with a different script interactive.py
https://github.com/facebookresearch/fairseq-py/blob/master/interactive.py
1、So, you mean I can use pytorch (0.3.0) and remove the flag --interactive ?How should I solve the mistake of the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file ?
2、I tested with pre-trained models, I run the command "./run.sh ./data/test/conll14st-test/conll14st-test.m2 ./log/ 0 ./models/mlconv_embed/ eolm" ,there are a same error: the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file.
Thank you very much.
-
If you use the recent version of Fairseq-py (which uses PyTorch 0.3.0), you should use the script
interactive.py
(https://github.com/facebookresearch/fairseq-py/blob/master/interactive.py) instead ofgenerate.py
. -
If you run
run.sh
with the recent version of Fairseq-py and not the one mentioned indownload.sh
script, you may encounter this error. This is because generate.py does not have the--interactive
flag anymore. I believe, it will use the test set within theprocessed/bin
directory and not the one that is provided through standard input. In our training script, we pass the development data itself to the--testpref
flag. See:
mlconvgec2018/training/preprocess.sh
Line 41 in 3f270bc
Btw, where did you obtain the 5458 sentences development set from? Did you download and process the training data yourself ?
1、The data set is provided by the teacher,include Lang-8 and NUCLE(version 3.2), 5458 sentence pairs from NUCLE, is taken out to be used as the development data. The training data include 132M sentence pairs .
2、I will try use interactive.py instead of generate.py.
3、You mean I need to turn the test set(conll14st-test.tok.src file) into the --testpref by run the command
python3.5 $FAIRSEQPY/preprocess.py --source-lang src --target-lang trg --trainpref processed/train --validpref processed/dev --testpref processed/dev --nwordssrc 30000 --nwordstgt 30000 --destdir processed/bin
4、Can you explain what the /training/processed/bin directory is for?
5、If I use version of Fairseq-py (which uses PyTorch 0.2.0), Do I need to compile and install pytorch from source? Instead of installing via pip? And Do other parameters need to be changed?
Use interactive.py
instead of generate.py
to decode the test set if you are using the latest Fairseq-py version. I was saying that alternatively, you can use generate.py
itself if you had used conll14st-test for --testpref
while doing preprocessing. The reason, I believe, is that in the current Fairseq-py, generate.py
automatically uses the test.src-trg.{src,trg}.{bin,idx} files within processed/bin directory to perform decoding. And interactive.py
decodes any input file that is passed through standard input.
-
The
training/processed/bin
directory contains the binarized and indexed versions of the training, development and test datasets for faster loading during training, validation and testing. Also, it contains the vocabulary files (dict.src.txt and dict.trg.txt). -
Yes, I had to compile Pytorch from source since the Fairseq-py version that I used required the ATen library which was only available on the github version of PyTorch and not in the official release back then.
Hello again. I can also trying to test the models using run.sh. But, ran into the same problem. I want to get the m2 scores, which is not in run.sh. The output would be output.bpe.nbest.txt .How to get those scores with the trained models?
I will follow the new fairseq implementation.
Any help is appreciated.
Thanks
Thank you for the wonderful source code.
I have a favor to ask of you.
The only GPU I can use is...
Colab GPU. Therefore, I couldn't do pretrain myself, so I wanted to use the pre-trained one.
https://tinyurl.com/yd6wvhgw/mlconvgec2018/models
Can I download test.src-trg.src.bin, test.src-trg.src.idx, etc. in addition to dict.src.txt, which is published in the link above?
I am referring to https://github.com/kanekomasahiro/bert-gec.