/emmorph2conll

morphology converter from emmorph to conll

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

emmorph2conll

The script converts the output tag of emMorph morphological analyzer to the corresponding tag of a version Szeged Treebank.

What's in this repo?

  • the main script of the converter: converter.py
  • auxiliary files in folder converterdata
  • license
  • this readme

The tagsets 🇭🇺

A detailed description of the tagsets is available here.

emMorph

emMorph is the current morphological analyzer for Hungarian and it is integrated into the e-magyar language processing toolchain. The list of emMorph tags is from here.

CoNLL

What we call here CoNLL is a modified version of the morphosyntactic tagset of MULTEXT transformed into a feature-value pair structure. This modified tagset is an annotation scheme for a version of the largest fully manually annotated corpus of Hungarian, Szeged Treebank.

How to use the converter?

  • standard input: token, lemma, emmorph tag separated by tab
  • standard output: conll tag

Dependencies

Python3

License

GNU General Public License v3.0

Our converters