THIS IS A WORK IN PROGRESS

Re-implementation of the system described in this paper, a POS tagger designed for low resource languages. The goal is to make it available and usable for anyone facing low resources issues in NLP.

MSETagger

POS tagging for low-resource languages, using specialized MorphoSyntactic Embeddings

Dependecies

The tagger is built on top of yaset for the Bi-LSTM tagger part and mimick to compute embeddings of OOVs

Requirements

SBT
a way to create virtual environments for Python
some data for your favorite language

Installation

clone the repo
create two virtual env for python 2 and 3 (sorry, yaset is in python3 and mimick in 2...) and pip install -r requirements(2|3).txt for each
sbt compile the scala code

Usage

Every options are set in the application.conf file, it includes: