/py-bangla-stemmer

Rule based Bengali Stemmer written in python

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Rule Based Bangla Stemmer

contents

Installation

$ pip install py_bangla_stemmer
>>> from py_bangla_stemmer import BanglaStemmer
>>>
>>> stemmer = BanglaStemmer()
>>> stemmer.stem('জনপ্রিয়তা')  # 'জনপ্রি'
>>> stemmer.stem(' সেটাই')    # 'সে'

Following documentations are for the further development of the stemmer. There is a file in py_bangla_stemmer/resources folder named common.rules. Bellow are the information required to know to change the rules.

$$X + n :$$

When X appears at the end of a word and word length is at least n, remove it

$$Y -> Z + n :$$

When Y appears at the end of a word and word length is at least n, replace it with Z

$$Y.Z -> A.B + n :`$$

When Y, followed by some character a, followed by Z appears at the end of a word and word length is at least n, replace it with AaB.