dkpro/dkpro-jwktl

Multiple genders in (German) wiktionary result in empty gender in jwktl

Closed this issue · 1 comments

Despite being defined as Substantiv, n, m in the German wiktionary, the resulting Entry for the word 'Liter' has an empty Gender field (NULL). This defective behavior is reproduceable for all noun pages which have a single entry with multiple genders (e.g. Cola). However in cases of multiple entries per page with (at least) one non-ambiguous and one ambiguous gender definition (e.g. Spezi), a gender is assigned to all entries.

The bug results from the insufficient specification of possible gender matches in parser/de/components/DEPartOfSpeechHandler.java.

The API now supports a getGenders() methods which returns a list of genders. For compatibility, getGender() may still be used to access the first gender accessed. So far, only the German part of speech handler properly supports the gender list.