The script converts the output tag of emMorph morphological analyzer to the corresponding tag of a version Szeged Treebank.
- the main script of the converter:
converter.py
- auxiliary files in folder
converterdata
- license
- this readme
A detailed description of the tagsets is available here.
emMorph is the current morphological analyzer for Hungarian and it is integrated into the e-magyar language processing toolchain. The list of emMorph tags is from here.
What we call here CoNLL is a modified version of the morphosyntactic tagset of MULTEXT transformed into a feature-value pair structure. This modified tagset is an annotation scheme for a version of the largest fully manually annotated corpus of Hungarian, Szeged Treebank.
- standard input: token, lemma, emmorph tag separated by tab
- standard output: conll tag
Python3
GNU General Public License v3.0