glample/tagger

EVALUATE

Closed this issue · 5 comments

I dont know how to evaluate my model (the folder has made after run train.py).
Anybody please help me?

Hi @binhna I am new to python can you please help me out with training the model using GoogleNews word embeddings? I am trying to train using the script

python train.py --train dataset/eng.train --dev dataset/eng.testa --test dataset/eng.testb --lr_method=adam --tag_scheme=iob --pre_emb=GoogleNews-vectors-negative300.bin --all_emb=300

I got this error:
image

I am stuck with this issue for about 2 months and couldn't resolve it. Thanks in advance.

Can you show me the first 3 lines of the word embedding that you are using?

@binhna Thanks alot your your response i am using word2vec-GoogleNews-vectors as provided in the link below. Its a .bin.gz file
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit

@binhna sorry to disturb you again but I skipped the word2vec-GoogleNews-vectors file and other parameters and tried to train the model using the already provided dataset i got an other error. Am i doing some thing wrong while training?

(env_name27) C:\Users\Acer\tagger-master>python train.py --train dataset/eng.train --dev dataset/eng.testa --test dataset/eng.testb --tag_scheme=iob
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GT 620M (CNMeM is enabled with initial size: 85.0% of memory, cuDNN not available)
Model location: ./models
Found 23624 unique words (203621 in total)
Found 84 unique characters
Found 9 unique named entity tags
14041 / 3250 / 3453 sentences in train / dev / test.
Saving the mappings to disk...
Compiling...
Starting epoch 0...
50, cost average: 14.516134
100, cost average: 8.294904
150, cost average: 14.409883
200, cost average: 11.035920
250, cost average: 14.829118
300, cost average: 8.705193
350, cost average: 10.033119
400, cost average: 10.041572
450, cost average: 11.864815
500, cost average: 10.191026
550, cost average: 11.418326
600, cost average: 10.012394
650, cost average: 10.535731
700, cost average: 12.022213
750, cost average: 10.865187
800, cost average: 10.012271
850, cost average: 10.825798
900, cost average: 12.069555
950, cost average: 11.846591
'.' is not recognized as an internal or external command,
operable program or batch file.
ID NE Total O B-LOC B-PER B-ORG I-PER I-ORG B-MISC I-LOC I-MISC Percent
0 O 42759 42759 0 0 0 0 0 0 0 0 100.000
1 B-LOC 1837 1837 0 0 0 0 0 0 0 0 0.000
2 B-PER 1842 1842 0 0 0 0 0 0 0 0 0.000
3 B-ORG 1341 1341 0 0 0 0 0 0 0 0 0.000
4 I-PER 1307 1307 0 0 0 0 0 0 0 0 0.000
5 I-ORG 751 751 0 0 0 0 0 0 0 0 0.000
6 B-MISC 922 922 0 0 0 0 0 0 0 0 0.000
7 I-LOC 257 257 0 0 0 0 0 0 0 0 0.000
8 I-MISC 346 346 0 0 0 0 0 0 0 0 0.000
42759/51362 (83.25026%)
Traceback (most recent call last):
File "train.py", line 220, in
dev_data, id_to_tag, dico_tags)
File "C:\Users\Acer\tagger-master\utils.py", line 282, in evaluate
return float(eval_lines[1].strip().split()[-1])
IndexError: list index out of range

Hey, sorry for the delay. You can do this by running the train.py script again, using the reload parameter. You will have to edit the code of train.py a bit to skip training and directly go to the evaluation part.