glample/tagger

Running time with GPU

Closed this issue · 6 comments

Hi, how much time will be saved by running this program on GPU rather than CPU?

Hi,

This will be slower on GPU than on CPU. Mostly because of the operations on the CRF layer I guess, and also because the implementation does not support mini-batch.

Hi @glample @HaniehP I am new to python can you please help me out with training the model using GoogleNews word embeddings? I am trying to train using the script

python train.py --train dataset/eng.train --dev dataset/eng.testa --test dataset/eng.testb --lr_method=adam --tag_scheme=iob --pre_emb=GoogleNews-vectors-negative300.bin --all_emb=300

I got this error:
image

I am stuck with this issue for about 2 months and couldn't resolve it. Thanks in advance.

Try 'ISO-8859-1' instead of 'UTF-8'. That helped me in another project.

@HaniehP Thank you so much for your response, for the time being when i tried to train the model with out the word embedding i got an other error, there is something wrong it seems:
(env_name27) C:\Users\Acer\tagger-master>python train.py --train dataset/eng.train --dev dataset/eng.testa --test dataset/eng.testb
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GT 620M (CNMeM is enabled with initial size: 85.0% of memory, cuDNN not available)
Model location: ./models
Found 23624 unique words (203621 in total)
Found 84 unique characters
Found 17 unique named entity tags
14041 / 3250 / 3453 sentences in train / dev / test.
Saving the mappings to disk...
Compiling...
Starting epoch 0...
50, cost average: 15.406189
100, cost average: 11.704297
150, cost average: 10.767459
200, cost average: 13.812738
250, cost average: 11.460194
300, cost average: 13.207466
350, cost average: 12.146099
400, cost average: 12.428576
450, cost average: 10.977689
500, cost average: 12.830771
550, cost average: 10.062991
600, cost average: 9.834551
650, cost average: 11.481623
700, cost average: 9.460655
750, cost average: 9.907359
800, cost average: 10.251657
850, cost average: 10.405848
900, cost average: 14.113665
950, cost average: 10.436158
'.' is not recognized as an internal or external command,
operable program or batch file.
ID NE Total O S-LOC B-PER E-PER S-ORG S-MISC B-ORG E-ORG S-PER I-ORG B-LOC E-LOC B-MISC E-MISC I-MISC I-PER I-LOC Percent
0 O 42759 42759 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100.000
1 S-LOC 1603 1603 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
2 B-PER 1234 1234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
3 E-PER 1234 1234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
4 S-ORG 891 891 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
5 S-MISC 665 665 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
6 B-ORG 450 450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
7 E-ORG 450 450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
8 S-PER 608 608 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
9 I-ORG 301 301 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
10 B-LOC 234 234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
11 E-LOC 234 234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
12 B-MISC 257 257 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
13 E-MISC 257 257 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
14 I-MISC 89 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
15 I-PER 73 73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
16 I-LOC 23 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
42759/51362 (83.25026%)
Traceback (most recent call last):
File "train.py", line 220, in
dev_data, id_to_tag, dico_tags)
File "C:\Users\Acer\tagger-master\utils.py", line 282, in evaluate
return float(eval_lines[1].strip().split()[-1])
IndexError: list index out of range

@HaniehP please guide me how can i convert my word embedding in .txt file to 'ISO-8859-1'?

In your python code, replace "codecs.open(path, 'r', 'utf8')" with "codecs.open(path, 'r', 'ISO-8859-1')"