wamania/php-stemmer

Polish stemmer

nmapx opened this issue · 9 comments

nmapx commented

Do you plan to add Polish steemer algo as well?

I'd actually be happy to add Polish (or any other language, really). All I need is the stemmer algorithm laid out (like this) and a sample word list with the correct stems (like this). Unfortunately the Snowball stemmer does not have an algorithm for Polish available.

nmapx commented

There are at least 3 different libraries with such functionality (Polish) but they are only for PL language and that's my problem. Your lib is much bigger and I would be happy to use it for all languages I need to handle.
I can help you with that but since you are an author just let me know if any of those makes sense to you.
https://github.com/dzieciou/pystempel
https://github.com/morfologik/morfologik-stemming
http://www.getopt.org/stempel/

These 3 libs doesn't implements the snowball algo, but custom stemmers.
As said here http://www.getopt.org/stempel/

There are many existing and well-known implementations of stemmers for English (Porter, Lovins, Krovetz) and other European languages (Snowball). There are also good quality commercial lemmatizers for Polish. However, there is only one freely available Polish stemmer, implemented by Dawid Weiss, based on the "ispell" dictionary and Jan Daciuk's FSA package.

Wihout a definition of the snowball for polish, it will be impossible to implement it here

nmapx commented

http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=en
You mean this one?
So there is an information
Lametyzator/Stempelator and FSA code are now part of the Morfologik project [outlink].
Which points to https://github.com/morfologik/morfologik-stemming

nmapx commented

There is not much data when I search for snowball and polish - found only this one: https://github.com/Uncpy/Old-Polish-language-stemmer-in-Snowball
I'm not aware of the quality.

Looks like that one translates from old Polish to new Polish.

nmapx commented

And what about morfologik-stemming?

nmapx commented

@msaari have you had a chance to take a look? 😅

No, but I did look now, and it's not a Snowball stemming algorithm. I can work with Snowball algorithms, but am not currently interested in diving into anything else, I just don't have the time. Implementing a Snowball stemmer takes me couple of hours, and that's what I can do now.