Pashto Part-of-speech (POS) Tagger

Pashto POS Tagger is a Machine Learning Model trained on the Pashto Corpus using the Conditional Random Fields (CRF) algorithm.

This repository contains the source code for the paper “The Pashto Corpus and Machine Learning Model for Automatic POS Tagging”.

The Pashto Corpus used to train the model consists of 2 million words manually tagged for POS information. A sample of the dataset (Corpus) is available in the “data” directory and a pre-trained model is in the “models” directory.

ijazul-haq/pashto_pos

Pashto Part-of-speech (POS) Tagger