Frog can't find ucto's configuration file for non-standard rules?
proycon opened this issue · 7 comments
There's something wrong with the installation of the historical models still. Frog can't seem to find the tokeniser settings:
$ frog --language dum
frog 0.19 (c) CLTS, ILK 1998 - 2019
CLST - Centre for Language and Speech Technology,Radboud University
ILK - Induction of Linguistic Knowledge Research Group,Tilburg University
based on [ucto 0.19, libfolia 2.4, timbl 6.4.14, ticcutils 0.23, mbt 3.5]
removing old debug files using: 'find frog.*.debug -mtime +1 -exec rm {} \;'
frog-:config read from: /data2/dev/share/frog/dum/frog.cfg
frog-:Missing [[mbma]] section in config file.
frog-:Disabled the Morhological analyzer.
frog-:Missing [[IOB]] section in config file.
frog-:Disabled the IOB Chunker.
frog-:Missing [[NER]] section in config file.
frog-:Disabled the NER.
frog-:Missing [[mwu]] section in config file.
frog-:Disabled the Multi Word Unit.
frog-:Also disabled the parser.
frog-mblem-frog-mblem-:Initiating lemmatizer...
ucto: textcat configured from: /data2/dev/share/ucto/textcat.cfg
frog-tok-:Language List =[dum]
ucto: No useful settingsfile(s) could be found.
frog-tagger-tagger-:reading subsets from /data2/dev/share/frog/dum//crmsub.cgn
frog-tagger-tagger-:reading constraints from /data2/dev/share/frog/dum//crmconstraints.cgn
frog-:Initialization failed for: [tokenizer]
frog-:fatal error: Frog init failed
$ cat /data2/dev/share/frog/dum/frog.cfg | grep tok
[[tokenizer]]
rulesFile=tokconfig-nld-historical
$ ls /data2/dev/share/ucto/*hist*
/data2/dev/share/ucto/tokconfig-nld-historical
I committed a fix that should solve this.
as stated in #82 this IMHO a misconception of the --language
parameter.
To select a specific frog configuration, you should use the -c
option.
--languages
can be used to let Frog give a hint to ucto about the languages to use.
In that case, the rulesFile should be overruled. (so ignored)
i find the readme description of '--language' misleading: most modules of frog only work for Dutch and current description suggests that it works for English and Portuguese too.
i propose use Ko's comment above explicit in the readme:
--languages is only intended for the ucto module in Frog about the languages to use.
I reverted this "fix" but also added code to make usage of -c (or --config) easier.
a command like frog -c dum/frog.cfg
is now working. telling frog to use an installed configuration
for the 'dum' language
And I updated the usage() and man page to be more clear about the intention of --language
assume this is fixed now