Training new SRL model: Unexpected role data
GraphGrailAi opened this issue · 1 comments
I am trying to train new SRL model:
root@engine:/var/www/engine/nlpnet-master/bin# nlpnet-train.py srl pred --gold train_google_ru.txt --data srl-model/
with txt file with 2 string, each on new line:
Его уверенная поступь – предмет зависти топ-менеджеров и разработчиков по всему миру.
Для техно-евангелистов Google – это самая крупная жемчужина сокровищницы.
Result of launching is error:
Reading training data...
Traceback (most recent call last):
File "/usr/local/bin/nlpnet-train.py", line 248, in <module>
text_reader = create_reader(args, md)
File "/usr/local/bin/nlpnet-train.py", line 61, in create_reader
only_predicates=args.predicates)
File "/usr/local/lib/python3.4/dist-packages/nlpnet/srl/srl_reader.py", line 70, in __init__
self._read_conll(filename)
File "/usr/local/lib/python3.4/dist-packages/nlpnet/srl/srl_reader.py", line 130, in _read_conll
tag, expected_role = self._read_role(tag, 'O', True)
File "/usr/local/lib/python3.4/dist-packages/nlpnet/srl/srl_reader.py", line 185, in _read_role
raise ValueError('Unexpected role data: %s' % role)
ValueError: Unexpected role data: по
This error is about strange thing: it cannot understand some words, for example "по" which is in english preposition 'over' (,,, all over the world). If i remove conflicting words it works the following way:
root@engine:/var/www/engine/nlpnet-master/bin# nlpnet-train.py srl pred --gold train_google_ru.txt --data srl-model/
Reading training data...
Loading vocabulary
Creating new network...
Generating word type features...
Created new network with the following layer sizes: 250, 50, 2
Training for up to 1 epochs
1 epochs Error: 0.000000 Accuracy: 1.000000 0 corrections skipped learning rate: 0.010000
Finished training
So, what's you advice to solve? How many times i need to train model to achieve good results?
Sorry for the very late response.
This error message says that your training data is not formatted correctly. A description of the format can be found at the nlpnet documentation for SRL; it is basically the CoNLL format.