Token annotation error for XML output with non-standard rules
marijnschraagen opened this issue · 3 comments
Maybe related to #80?
When using XML output with non-standard rules there is a token-annotation error. Command:
frog -t myfile.txt -X myresult.xml --language=nld-vnn
Output:
frog 0.19 (c) CLTS, ILK 1998 - 2019
CLST - Centre for Language and Speech Technology,Radboud University
ILK - Induction of Linguistic Knowledge Research Group,Tilburg University
based on [ucto 0.19, libfolia 2.4, timbl 6.4.14, ticcutils 0.23, mbt 3.5]
removing old debug files using: 'find frog.*.debug -mtime +1 -exec rm {} \;'
frog-:config read from: /usr/local/share/frog/nld-vnn/frog.cfg
frog-:Missing [[mbma]] section in config file.
frog-:Disabled the Morhological analyzer.
frog-:Missing [[IOB]] section in config file.
frog-:Disabled the IOB Chunker.
frog-:Missing [[NER]] section in config file.
frog-:Disabled the NER.
frog-:Missing [[mwu]] section in config file.
frog-:Disabled the Multi Word Unit.
frog-:Also disabled the parser.
frog-mblem-:Initiating lemmmmatizer...
ucto: textcat configured from: /usr/local/share/ucto/textcat.cfg
frog-tok-:Language List =[nld-vnn]
ucto: No useful settingsfile(s) could be found (initiating from language list: [nld-vnn])
frog-tagger-tagger-:reading subsets from /usr/local/share/frog/nld-vnn//babsub.cgn
frog-tagger-tagger-:reading constraints from /usr/local/share/frog/nld-vnn//babconstraints.cgn
frog-:Thu Sep 12 19:09:35 2019 Initialization done.
frog-:Thu Sep 12 19:09:35 2019 Frogging myfile.txt
[first sentence processed ok, removed here]
Word(class='WORD-COMPOUND',generate_id='myfile.txt.p.1.s.1',
set='tokconfig-nld-vnn',space='no') creation failed: DeclarationError:
Set 'tokconfig-nld-vnn' is used but has no declaration for token-annotation
The regular column-based output works without any problems.
I can indeed replicate this. It seems related to LanguageMachines/ucto#72 .
Well.... The problem is here that frog uses the 'language' nld-vnn which refers to the configuration in /usr/local/share/frog/nld-vnn/
Ucto is then initialized from /usr/local/share/frog/nld-vnn/frog.cfg
using:
[[tokenizer]]
rulesFile=tokconfig-nld-historical
So for ucto the language is nld-historical
This is confusing for us as well the software....
When I run Frog like this:
frog -c /usr/local/share/frog/nld-vnn/frog.cfg -X uit.xml -t txt
all seem well.
So that might be a quick workaround.
As a matter of fact, I am inclined to think that this is an abuse of the --language
parameter.
It is meant to give frog a hint about the languages to detect, and NOT to tell which configuration to use.
When using --languages, frog should ignore the rulesFile information from the frog config file.
This was so until @proycon "fixed" it in #80
That was putting the cart before the horse probably.
We need to rethink this.