Polish stemmer
nmapx opened this issue · 9 comments
Do you plan to add Polish steemer algo as well?
There are at least 3 different libraries with such functionality (Polish) but they are only for PL language and that's my problem. Your lib is much bigger and I would be happy to use it for all languages I need to handle.
I can help you with that but since you are an author just let me know if any of those makes sense to you.
https://github.com/dzieciou/pystempel
https://github.com/morfologik/morfologik-stemming
http://www.getopt.org/stempel/
These 3 libs doesn't implements the snowball algo, but custom stemmers.
As said here http://www.getopt.org/stempel/
There are many existing and well-known implementations of stemmers for English (Porter, Lovins, Krovetz) and other European languages (Snowball). There are also good quality commercial lemmatizers for Polish. However, there is only one freely available Polish stemmer, implemented by Dawid Weiss, based on the "ispell" dictionary and Jan Daciuk's FSA package.
Wihout a definition of the snowball for polish, it will be impossible to implement it here
http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=en
You mean this one?
So there is an information
Lametyzator/Stempelator and FSA code are now part of the Morfologik project [outlink].
Which points to https://github.com/morfologik/morfologik-stemming
There is not much data when I search for snowball
and polish
- found only this one: https://github.com/Uncpy/Old-Polish-language-stemmer-in-Snowball
I'm not aware of the quality.
Looks like that one translates from old Polish to new Polish.
And what about morfologik-stemming
?
No, but I did look now, and it's not a Snowball stemming algorithm. I can work with Snowball algorithms, but am not currently interested in diving into anything else, I just don't have the time. Implementing a Snowball stemmer takes me couple of hours, and that's what I can do now.