shibukawa/snowball_py

Spanish algorithm

stefanocoding opened this issue · 4 comments

Hi,
thanks for share this code.

I've been playing with the Spanish stemmer and I noticed something wrong with the word "años". "años" means "years" and "año" means "year".
What I noticed is that "años" is not cut to "año" as in the case of "years" and "year".

Someone knows if this is in purpose or it's an error?
I haven't read the algorithm yet.

Hi

I just made compiler for the DSL for hhttps://github.com/snowballstem/snowball/blob/master/algorithms/spanish/stem_ISO_8859_1.sbl . To improve the quality, we should commit to upstream snowball project.

Snowball Stemmer doesn't provide perfect stemming algorithms (maybe it is impossible). It provides rough (but enough in most cases) logic. So the result is wrong, it's exact classification is not a bug, but "not implemented".

Snowball guys and I used the following files to verify. Now año(line 1706) and años(line 1715) seem to have different result. I don't know it is intended or not.

https://github.com/snowballstem/snowball-data/tree/master/spanish

Thank you for the explanation @shibukawa.
Maybe would be better if I create the issue on https://github.com/snowballstem/snowball?

Yo estudiar español por dos semanas 😃

Okay, I'm going to do it.

Really? I'm a native speaker. It's a difficult language. Duolingo and Busuu are great places to learn the language. :)