udapi/udapi-python

VerbForm should not be required in some languages

dan-zeman opened this issue · 3 comments

The MarkBugs block currently reports a bug if a verb does not have the VerbForm feature. This is a good requirement for many languages (e.g., all Indo-European languages). However, some languages do not distinguish finite and non-finite verb forms, and then it would make sense to omit the feature from the language. Indonesian seems to be an example.

What would be the best way of limiting the requirement to certain languages or language families?

What about ud.MarkBugs skip=no-VerbForm?

If you have the list of languages where empty VerbForm is allowed, you can add it to MarkBugs in a PR. The language should be encoded in the "zone" label, so you could use something like if node.root.zone.split("_")[0] in {"id", "xy" ...}". To make this work, you either need to have the language code encoded in tree IDs, or use read.Conllu zone=id.

OK. The second solution sounds more useful to me. I call Udapi from the UD treebank evaluator and I can supply the language code via read.Conllu zone=xx but I would prefer not to selectively turn off tests in the call.

There is the question whether language-specific options like this can be effectively maintained within MarkBugs (when they also have to be maintained at other places). But MarkBugs will probably need occasional updates in any case, also in the area of the universal guidelines.