dkpro/dkpro-uby

IMSLex-Subcat: Convert auxiliaries

chmeyer opened this issue · 2 comments

They are currently missing. I have already prepared some code to extract the auxiliary from the original IMSLex files, but we need to decide how to represent verbs that take both "haben" and "sein". AFAIK, they are specially tagged with "-variant"(?). Currently we do not have an enum value for having both auxiliaries in EAuxiliary. Solutions may be

  • adding a combined value "habenSein" to EAuxiliary or
  • duplicating the subcat frames with differing links to a haben- and sein-LexemeProperty.

auxiliary and subcat frame together constitute a large part of a verb sense
they may not be separated.

haben-variant should be represented as haben,
sein-variant as sein

the aux. information goes into the LexemeProperty which is linked to Sense

so it would be your second suggestion, see also the Subcat frame class (-> modeling that subcat frame and aux. belong together):

// LexemeProperty of this SubcategorizationFrame
@VarType(type = EVarType.CHILD)
private LexemeProperty lexemeProperty;