Code base for the paper "N-gram Statistical Stemmer for Bangla Corpus".
Dept. of Language Science and Technology, University of Saarland
Email: ataur[at]coli[dot]
- Run any version of the stemmer (min - based on minimum edit distance | kmeans - based on K-means Clustering)
- It might take some time w.r.t the input data provided/used
doi = {10.48550/ARXIV.1912.11612},
url = {},
author = {Sadia, Rabeya and Rahman, Md Ataur and Seddiqui, Md Hanif},
keywords = {Computation and Language (cs.CL), Information Retrieval (cs.IR), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {N-gram Statistical Stemmer for Bangla Corpus},
publisher = {arXiv},
year = {2019},
copyright = { perpetual, non-exclusive license}