VerbForm should not be required in some languages
dan-zeman opened this issue · 3 comments
The MarkBugs block currently reports a bug if a verb does not have the VerbForm
feature. This is a good requirement for many languages (e.g., all Indo-European languages). However, some languages do not distinguish finite and non-finite verb forms, and then it would make sense to omit the feature from the language. Indonesian seems to be an example.
What would be the best way of limiting the requirement to certain languages or language families?
What about ud.MarkBugs skip=no-VerbForm
?
If you have the list of languages where empty VerbForm
is allowed, you can add it to MarkBugs
in a PR. The language should be encoded in the "zone" label, so you could use something like if node.root.zone.split("_")[0] in {"id", "xy" ...}"
. To make this work, you either need to have the language code encoded in tree IDs, or use read.Conllu zone=id
.
OK. The second solution sounds more useful to me. I call Udapi from the UD treebank evaluator and I can supply the language code via read.Conllu zone=xx
but I would prefer not to selectively turn off tests in the call.
There is the question whether language-specific options like this can be effectively maintained within MarkBugs
(when they also have to be maintained at other places). But MarkBugs
will probably need occasional updates in any case, also in the area of the universal guidelines.